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Abstract 


This  thesis  investigates  the  process  of  designing 
queueing  network  models  of  computer  systems,  performing  the 
required  measurements  of  the  systems,  and  specifying  the 
parameters  for  the  models.  Thus,  it  complements  the 
literature  about  queueing  network  models,  which  to  a  great 
extent  deals  with  the  mathematical  representation  of  the 
computer  systems  or  with  solution  algorithms  for  the  models. 

The  methodology  starts  out  with  the  definition  of  the 
models  as  requited  by  the  performance  measures  to  be 
predicted.  The  modelling  steps  from  the  measurements  of  the 
actual  computer  system  to  the  specification  of  the  input 
parameters  for  the  solution  algorithms  are  described.  The 
implications  of  the  abstraction  steps  and  the  underlying 
assumptions  are  made  apparent.  An  intermediate  system  model 
is  introduced  that  represents  the  computer  system  on  a 
logical  level,  separating  the  three  major  components  of 
influence  on  the  performance  measures:  the  hardware,  the 
software,  and  the  workload.  This  separation  facilitates 
representation  of  system  changes  in  the  model.  For 
different  levels  of  modelling  accuracy,  it  is  shown  what 
system  measurements  are  required  and  by  what  monitoring 
tools  they  can  be  obtained.  Special  consideration  is  given 
to  easily  usable  and  available  monitors.  The  feasibility  of 
the  methodology  developed  is  demonstrated  by  examples. 
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1 .  In t rod uc tion 


1*1  Queueing  Network  Models  of  Computer  Systems 


The  overall  costs  of  computer  systems  and  their 
applications  are  constantly  increasing.  Consequently,  the 
needs  for  efficient  allocation  of  the  resources  available 
and  for  planning  ahead  to  meet  future  requirements  are  also 
increasing.  To  meet  these  demands  we  need  tools  for 
evaluating  the  efficiency  of  existing  systems  and  for 
projecting  their  performance  under  changes  of  the  systems 
and  their  environments  that  might  occur  in  the  future. 
Queueing  network  models  are  becoming  more  and  more  accepted 
as  such  tools  to  evaluate  and  predict  computer  system 
performance. 

In  a  queueing  network  model,  the  essential  parts  of  a 
computer  system  are  represented  by  a  network  of  servers  and 
queues  that  are  connected  by  transition  paths.  Tokens  that 
symbolize  tasks  (programs)  circulate  through  the  network, 
their  flow  being  controlled  by  the  probabilities  of  their 
transitions  from  one  server  to  another.  Their  service 
requirements  at  each  of  the  servers  are  determined  by 
probability  distributions.  By  specifying  these 
probabilistic  model  parameters  according  to  the 
corresponding  parameters  of  an  actual  computer  system,  we 
obtain  a  queueing  network  model  that  reflects  the  important 
aspects  of  the  computer  system’s  configuration  and 
operation . 

Queueing  network  models  can  be  used  to  obtain  insight 
into  a  computer  system’s  behaviour  and  its  reaction  to  its 
workload.  They  also  can  be  applied  to  locate  performance 
problems  such  as  bottlenecks  in  a  system.  Their  greatest 
power,  however,  lies  in  their  ability  to  adapt  to  projected 
changes  in  a  system  or  in  its  workload,  and  to  predict  the 
system’s  performance  under  these  changes. 
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There  are  two  approaches  to  how  aodels  can  be  employed 
fcE  perfcrraance  analysis*  In  the  relative  a pproachy  a 
particular  real  or  planned  system  is  not  modelled,  but 
instead  special  features  of  computer  systems  are 
investigated  in  a  rather  general  context.  The  reaction  of 
the  model  to  parameter  changes  is  analyzed,  and  the  results 
can  be  interpreted  only  relative  to  each  other.  Using  the 
relative  approach  one  can,  for  example,  study  the  benefits 
of  having  cne  ccmmon  queue  for  several  equal  servers  instead 
of  a  queue  for  each  server.  One  can  alsc  compare  the 
performance  of  scheduling  algorithms.  The  predictive 
ability  of  the  relative  approach  for  a  particular  system  is 
limited.  It  can,  at  best,  yield  ratios  of  change  or  bounds 
on  changes  for  certain  performance  measures. 

In  the  absolute  approach ,  a  particular  computer  system 
is  modelled  as  accurately  as  possible.  The  parameters  are 
either  based  on  measurements  of  an  actual  system  or,  it  the 
modelled  system  cannot  be  measured,  on  estimates.  The 
results  of  the  models  are  absolute  performance  measures.  If 
it  is  possible,  an  absolute  model  should  be  validated  before 
projected  system  changes  are  adopted  and  the  predictions  are 
calculated.  The  distinction  between  the  absolute  and  the 
relative  approach  is  based  on  the  notions  of  absolute  and 
relative  performance  found  in  [W0GP71]. 


1.2  Performance  Evaluation  Concepts 


Besides  queueing  network  models,  there  exists  a  wide 
variety  of  other  performance  evaluation  techniques  for 
computer  systems.  In  the  following  section  we  will 
establish  the  context  in  which  queueing  network  models 
should  be  viewed. 

Current  computer  systems  are  too  complex  for  analysts  to 
comprehend  by  inspection  their  dynamics  and  the  various 
influences  on  their  performance.  Thus,  the  analysts  have  to 
resort  to  models  that  represent  only  those  aspects  of  the 
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systems  that  are  essential  for  a  particular  investigation. 
Models  are  simpler  in  structure  and  behaviour  than  the 
systems  they  represent.  This  makes  it  easier  to  analyze 
them.  From  the  results,  we  can  draw  conclusions  about  the 
actual  systems.  There  are  two  main  categories  of  models 
employed  for  performance  evaluation:  analytic  models  and 
simulation  models.  Both  model  categories  can  be  used  with 
either  the  realtive  or  the  absolute  approach  for  trend 
analyses  or  performance  predictions  respectively. 


Analy  t ic  models,  such  as  queueing  network  models, 
regression  models,  or  linear  optimization  models,  capture 
important  parameters  of  a  computer  system  and  relate  them  to 
each  ether  by  mathematical  formulas.  They  are  evaluated  by 
computational  algorithms.  Due  to  the  mathematical  aspects 
of  their  solution  algorithms,  they  frequently  require 
restrictive  assumptions  that  are  normally  not  satisfied  by 
the  actual  computer  systems.  Their  complexity  and  the 
amount  of  detail  they  can  cover  are  limited  by  the 
feasibility  of  the  mathematical  solutions.  Despite  these 
disadvantages  they  have  proved  to  be  very  useful  and  cost- 
effective  tools  for  providing  estimates  of  important 
performance  measures.  Once  the  models  are  established,  they 
are  easy  and  inexpensive  to  apply,  and  the  results  can  be 
produced  very  quickly.  In  many  cases,  the  parameter 
estimation  -  a  problem  for  both  modelling  categories  -  may 
introduce  more  error  than  unrealistic  assumptions  or 
omission  of  details. 


In  simulation  models  the  operation  of  a  computer  system 
is  simulated  by  a  program.  The  amount  of  detail  that  can  be 
captured  is  virtually  unlimited,  and  a  wide  range  of 
performance  measures  can  be  recorded.  However,  simulation 
models  can  be  very  expensive  both  to  design  and  to  use.  The 
repeated  application  of  a  model  can  be  severely  limited  by 
its  computation  costs.  The  amount  of  detail  absolute 
simulation  models  can  represent  realistically  is  restricted 
by  the  accuracy  with  which  the  parameters  can  be  specified 
from  the  measurement  data  of  the  actual  system. 
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Measurements  sometimes  can  identify  performance  problems 
of  computer  systems  in  their  current  state,  but  they  have  no 
predictive  ability.  Thus  they  cannot  be  regarded  as  a 
performance  evaluation  method  equivalent  to  modelling. 
Measurements  are  most  efficiently  employed  in  the 
performance  evaluation  cf  computer  systems  if  they  are  used 
as  a  basis  for  models  using  the  absolute  approach. 


1.3  Goals  and  Plan  of  the  Thesis 


This  thesis  is  intended  to  provide  insight  into  the 
application  oriented  aspects  of  queueing  network  models  of 
computer  systems.  It  concentrates  on  the  development  and 
the  validation  of  absolute  models.  Thus  it  contrasts  with 
and  complements  the  recent  important  research  on  the 
mathematical  aspects  of  queueing  network  models  and  on 
relative  models. 


First,  we  give  an  overview  ov 
available  by  classifying  th 
capabilities  and  the  effort  they 
we  discuss  the  measurement  tools 
provide  the  data  upon  which  a 
based.  Special  consideration 
distributed  and  easily  usable 
measurement  requirements  of  se 
specified  (chapters  4,  5  and 

characteristics  of  a  monitor  spec 
network  models  are  proposed  (chap 


er  the  modelling  techniques 
em  according  to  their 
require  (chapter  2) .  Then, 
and  techniques  that  can 
bsolute  queueing  models  are 
will  be  given  to  widely 
monitors  (chapter  3)  .  The 
veral  model  types  are 
6),  and,  as  a  conclusion, 
ially  designed  for  queueing 
ter  7)  . 


The  central  goal  of  this  thesis  is  to  develop  a 
methodology  for  the  modelling  process  from  the  design  of  a 
model,  through  the  measurements  of  the  computer  systems,  to 
the  parameter  specification  for  the  solution  algorithm. 
This  modelling  framework  is  intended  to  make  the  decisions 
involved  more  transparent,  to  provide  a  formalism  for  the 
parameter  specification,  and  to  support  the  representation 
of  system  changes  in  the  model  (chapter  4) . 
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Finally,  we  demonstrate  the  methodology  by  developing 
and  validating  an  absolute  model  that  is  comparable  in 
complexity  with  recently  published  models  (chapter  5) •  The 
practical  aspects  of  extensions  to  this  model  that  provide 
mete  accuracy  and  represent  more  features  of  computer 
systems  are  outlined  briefly  (chapter  6). 


2,  Survey  of  Modelling 


1 1 


The  great  variety  of  queueing  models  and  related  results 
can  be  surveyed  only  very  briefly  in  this  chapter#  For  a 
mere  complete  treatment  the  references  given  throughout  the 
chapter  should  be  consulted# 


2^1  Basic  Calculations 


Before  proceeding  to  actual  queueing  models,  we 
introduce  some  basic  results  for  queueing  systems#  These 
results  do  not  require  assumptions  as  restrictve  as  solvable 
'  queueing  models  do,  but  hold  for  a  wide  range  of  queueing 
systems.  They  give  rough  estimates  of  some  parameters  of 
queueing  systems  and  are  useful  for  consistency  checks  of 
the  input  parameters  and  the  results  of  queueing  models#  A 
good  account  of  most  of  the  material  covered  in  this  section 
can  be  found  in  [KLEI75  and  KLEI76]. 


Little*s  Formula 

One  of  the  most  general  and  most  widely  used  formulas  in 
queueing  theory  is  Little's  formula  [LITT61]: 

K  =  L  ♦  H 

where  N  is  the  mean  number  of  customers  in  the  system,  L  is 
the  mean  arrival  rate,  and  W  is  the  mean  time  in  system#  It 
requires  no  specific  assumptions  about  the  arrival  process 
or  the  service  process,  except  that  the  system  must  be  work 
conserving,  and  it  must  not  be  saturated.  A  system  is  w ork 
conserving  if  no  service  requirements  are  created  or 
destroyed  within  the  system.  This  includes  the  constraint 
that  the  server  must  not  be  idle  as  long  as  there  are 
customers  in  the  queue  [KLEI76]#  A  system  is  saturated  if 
the  mean  interarrival  time  is  smaller  than  or  equal  to  the 
mean  service  time#  'System*  can  have  many  meanings  for 
Little's  formula;  It  may  be  a  server  only,  a  server  and  its 
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queue,  or  an  entire  queueing  network.  Little's  formula  is 
generally  used  in  queueing  applications  to  calculate  the 
expected  time  in  system. 


Klein  rock's  Conservation  Law 
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Saturation  Points  of  Queues 


For  a  single  server  queueing  system  with  a  finite  source 
of  customers,  Kleinrock  [KLEI68]  defines  the  saturating 
number  of  customers  in  the  system  as  follows: 

1  1 

I  u  u 

Ns  = - =  1  ♦ - 

1  L 

u 

where  Ns  is  the  saturating  number  of  customers,  u  is  the 
service  rate,  and  L  is  the  arrival  rate  to  the  system.  The 
asymptotic  response  time  for  large  N  is  shown  to  be: 


N  -  Ns  1 

T(N)  - - 

u 
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Thus,  T(N)  is  directly  proportional  to  N*  Note  that  the 
Bean  response  time  in  such  a  system  remains  finite  no  matter 
how  large  the  number  of  customers  is.  Buzen  refers  to  this 
fact  as  the  "limited  damage"  property  of  closed  systems 
[EUZE77].  In  a  system  with  an  infinite  source  of  customers 
saturation  has  a  different  meaning.  An  arrival  rate  larger 
than  the  service  rate  leads  to  an  infinite  queue  length  and, 
as  can  be  verified  easily  with  Little's  formula,  to  an 
infinite  response  time. 


If  the  service  rate  and  the  arrival  rate  are  known,  the 
calculation  of  the  saturation  point  can  be  used  as  a  rough 
estimate  of  the  capacity  of  a  system.  The  parameters  for 
this  estimation  are  hard  to  find  if  no  measurements  are 
available.  One  could  argue  that,  if  the  parameters  are  known 
with  some  confidence,  we  can  get  more  detailed  results  by 
using  them  in  a  simple  machine  repairman  model  [KLEI75]. 
However,  the  machine  repairman  model  requires  the  assumption 
of  exponential  distributions  of  the  interarrival  times  and 
service  .times,  whereas  the  saturation  point  approach 
requires  only  that  the  service  discipline  be  work 
conserving. 


Asymptotic  Behaviour  of  Closed  Queueing  Networks 


Muntz  and  Wong  [MW  74]  have  made  similar  observations 
about  closed  queueing  networks.  A  central  variable  in  their 
formulas  is  the  relative  utilization  of  a  server.  The 
relative  utilizations  of  two  devices  are  guaranteed  to  be  in 
the  same  ratio  as  the  total  times  spent  by  customers  on 
average  at  these  devices.  Muntz  and  Wong  have  derived 
several  results: 

1)  The  server  s  with  the  largest  relative  utilization,  Xs, 
is  the  limiting  resource  in  a  closed  queueing  system. 
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2)  The  saturating  number  of  customers  is: 

M 

Zxi 


Ns  = - ♦ 

Xs 


1=1 


where  Xi  is  the  relative  utilization  of  resource  i,  s  is  the 
index  of  the  limiting  resource^  and  M  is  the  number  of 
devices  in  the  network.  This  definition  of  the  saturating 
number  of  customers  corresponds  directly  to  the  definition 
for  the  single  server  system  in  the  previous  section. 


3)  Let  Ui  (N)  be  the  utilization  of  server  i  when  N  customers 
are  in  the  system.  Then  the  following  equation  holds: 


Ui  (N)  _  Xi 
Uj(N)  ■  Xj 

for  all  N,  i,  and  j.  As  Us(N),  the  utilization  of  the 
saturating  device  with  N  customers  in  the  system,  approaches 
one  with  increasing  N,  the  utilizations  of  the  other  devices 
approach  their  limiting  utilizations,  which  can  be 
calculated  by  the  equation  above.  If  Xj  is  the  largest 
relative  utilization  smaller  than  Xs,  then  the  gain  in 
utilization  by  removing  the  bottleneck  at  device  s  can  be  at 
most  a  factor  of  Xs/Xj. 


4)  For  a  large  number  of  customers  in  the  system  (N  >>  Ns) 
the  time  a  job  takes  for  one  cycle  through  the  system  is 
proportional  to  N,  just  like  for  single  queues.  The 
increase ^in  cycle  time  with  one  additional  customer  is  the 
service  time  of  this  customer  at  the  saturating  device. 
Thus  it  is  easy  to  calculate  an  upper  bound  on  the  reduction 
of  the  cycle  time  by  removing  the  bottleneck. 

These  results  are  easily  obtained  and  they  hold  for  very 
general  cases.  Only  very  weak  assumptions  must  be  made: 
the  queueing  disciplines  have  to  be  work  conserving  and  the 
service  rates  of  the  servers  must  be  independent  of  the 
state  of  the  network. 
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Fundaaental  Laws 

In  his  paper  on  "fundaaental  laws",  Buzen  [EUZE76] 
reviews  the  asyaptotic  properties  of  the  previous  section 
from  the  viewpoint  of  operational  analysis.  He  adds  two  new 
otser vations  that  relate  to  the  memory  usage  of  the  jobs. 

The  space  time  product  throughput  law  is  stated  as: 

M 

Y 

where  X  is  the  throughput,  M  is  the  average  amount  of  memory 
available,  and  Y  is  the  average  space  time  product  per  job. 

The  space  time  product  response  time  law  for  interactive 
systems  is  stated  as: 

N  ♦  y 

R  = -  Z 

M 

where  R  is  the  mean  response  time,  N  is  the  mean  number  of 
active  terminals,  and  Z  is  the  mean  think  time  (the  mean 
time  the  jobs  dwell  at  the  terminals)  • 

These  laws  hold  under  very  general  conditions  as  well. 


2t.2  Classification  of  Queueing  Models 

In  recent  years,  a  large  number  of  queueing  models  have 
been  developed.  These  range  from  very  simple  models  to  very 
complei  and  detailed  models.  In  the  following,  we  develop  a 
common  framework  within  which  the  models  can  be  defined. 
This  gives  us  a  more  systematic  view  of  the  great  variety  of 
queueing  models  and  makes  it  easier  to  relate  them  to  each 
other.  To  do  this,  we  concentrate  on  the  mathematical 
aspects  of  the  queueing  models  and  disregard  their 
applications  and  the  mapping  from  real  systems  into  the 
models.  These  aspects  will  be  treated  later. 
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Queueing  oodels  can  be  classified  according  to  the 
following  properties: 


•  model  structure 

•  workload  classes 

•  service  disciplines 

•  service  demand  description 

•  server  character isitics 


These  properties  can  be  viewed  as  five  dimensions  that 
build  a  model  space.  The  first  two  dimensions  are  global 
dimensions  for  the  entire  model,  and  the  other  three  are 
server  oriented  dimensions  that  have  to  be  specified  for 
each  server  separately.  The  values  in  the  domain  of  a 
dimension  represent  the  different  levels  of  accuracy  with 
which  the  corresponding  properties  of  the  real  system  can  be 
matched  by  the  model.  The  values  of  a  domain  can  be  ordered 
according  to  their  increasing  capability  to  model  a  real 
system  accurately.  This  means  in  most  cases  also  an  ordering 
according  to  the  increasing  complexity  of  the  models  and  the 
increasing  cost  of  their  solution. 


In  the  following,  the  domains  o 
defined  considering  the  most  comm 
This  definition  is  not  meant  t 
domains  may  certainly  be  extended  t 
values. 


f  the  five  dimensi 
only  used  model 
o  be  exhaustive, 
o  cover  a  wider  ra 


ons  are 
types, 
and  the 
nge  of 


Hodel  Structure 

The  domain  of  the  model  structure  dimension  contains  the 
following  values: 

•  A  single  s^ver  model  consists  of  one  server  and  its 
queue.  It  may  have  a  feedback  loop  through  which 
partially  completed  tasks  can  be  routed  back  to  the 
queue . 
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•  A  cyclic  queueing  aodel  is  a  closed  model  whose 
servers  and  queues  are  arranged  in  a  cycle.  A  model 
is  closed  if  no  customers  can  arrive  at  the  system  or 
leave  it.  Otherwise  it  is  open .  The  number  of 
customers  in  a  closed  network  is  constant. 

•  A  central  server  model  contains  one  central  server  and 
several  peripheral  servers,  each  having  its  own  queue. 
Customers  can  reach  the  peripheral  servers  only  from 
the  central  server,  and  upon  completion  at  a 
peripheral  server  they  can  proceed  only  back  to  the 
central  server.  The  number  of  customers  cycling 
through  a  central  server  network  is  constant  because 
it  is  a  closed  network. 

•  A  general  queueing  network  may  be  either  open  or 
closed.  Customers  can  be  routed  arbitrarily  among  the 
servers • 


A  hierarchical  mo^l  consists  of  several  queueing 
models  on  different  levels  of  detail.  Subsystems  of  a 
queueing  system  are  modelled  by  separate  closed 
network  models  on  a  detailed  level.  In  a  network 
model  on  a  higher  level,  each  subsystem  is  then 
represented  by  a  server  whose  behaviour  is  determined 
by  the  more  detailed  model  on  the  lower  level.  A 
hierarchical  model  can  have  two  or  more  levels. 


Workload  Classes 

The  values  of  the  workload  classes  domain  are  the  three 
ways  to  distinguish  among  jobs  of  different  workload 
characteristics. 

•  In  single  class  models  all  jobs  are  assumed  to  be 
statistically  identical. 

•  multiple  class  models,  each  class  of  customers  can 
have  different  characteristics.  Customers 
change  their  classes. 


cannot 
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In  a  model  that  has  multiple  classes  with  class 
changes,  the  customers  can  change  class  at  each 
transition  between  two  servers. 


Service  Disciplines 


Disciplines  generally  used  in  queueing  models  have  two 
common  properties:  1)  they  are  work  conserving,  and  2)  they 
assume  no  knowledge  of  the  service  requirements  of  a  single 
customer  in  that  all  customers  in  the  same  class  are  treated 
equally,  and  they  are  assumed  to  be  statistically  identical. 

The  service  discipline  domain  contains  the  following 
va lues : 

•  Processor  sharing  (PS)  can  be  viewed  as  a  round  robin 
discipline  with  a  quantum  length  approaching  zero. 

•  1?2  gueuein  g  11121  assumes  that  every  customer  has  its 
own  server,  so  that  no  queueing  ever  occurs. 

•  i^st  come  first  served  preemptive  £§sume  (LCFSPR) 
assigns  the  server  immediately  to  newly  arriving 
customers.  If  the  server  is  busy  at  the  arrival  of  a 
new  customer  the  customer  currently  being  served  is 
preempted  and  is  resumed  after  that  customer  has  been 
completed, 

•  fi^st  come  first  served  (FCFS)  schedules  customers  in 
the  order  of  their  arrival. 

•  Priority  disciplines  schedule  customers  according  to 
priorities  assigned  to  each  class.  Within  each  class 
customers  may  be  scheduled  either  by  FCFS  or  PS. 

The  first  three  disciplines  share  the  important  property 
that  customers  arriving  in  a  Poisson  stream  at  a  server  also 
leave  it  in  a  Poisson  stream,  even  when  the  server  has  an 
arbitrary  service  time  distribution  [nuNT73].  Thus  the 
disciplines  most  frequently  used  in  queueing  models  can  be 
divided  into  three  classes:  The  Poisson  stream  preserving 
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d isciplines  (also  called  local- balance  disciplines)  ,  FCFS, 
and  priority  disciplines* 


Service  Demand  Description 

The  values  of  the  service  denand  description  domain  are 

-m 

the  levels  of  detail  at  which  the  woricload  of  a  queueing 
system  can  be  described: 

•  The  wor)cload  vector  contains^  for  each  server  and  each 
class,  the  mean  service  requirement  of  a  customer  of 
that  class  at  that  server  during  the  customers  life 
time.  This  description  is  equivalent  to  a 
specification  of  the  service  demand  by  mean  burst 
times  and  transition  probabilities  among  the  servers 
(see  chapter  4) • 

•  The  transition  probabilities  among  the  servers  and  the 

distributions  represented  by  the  method 
of  Cox  [COX55]  give  a  more  detailed  description  of  the 
workload . 


Server  Characteristic 


The  server  characteristic  describes  the  reaction  of  the 
server  to  its  load.  This  domain  contains  three  values: 

•  Lo^d  independent  servers  have  constant  service  rates. 

•  toad  dependen  t  servers  have  service  rates  that  depend 
on  the  number  of  customers  in  service  or  awaiting 
service  at  the  server. 

•  and  class  dependent  servers  have  for  each  class 
a  set  of  service  rates  that  depend  on  the  number  of 
customers  of  that  particular  class  at  the  server. 


20 


Model  Types 

In  order  to  define  a  aodel  by  oeans  of  the  dioensions 
outlined  above,  first  the  two  global  dimensions  must  be 
specified,  and  then  the  server  related  dimensions  must  be 
determined.  As  a  short  form  for  a  model  description, 
subsequently  called  the  model  type,  its  classification 
vector  can  be  specified.  The  classification  vector  contains 
the  values  for  the  global  dimensions,  and  for  each  server 
related  dimension,  the  most  complex  value  that  occurs  at  any 
server  in  the  model.  The  most  complex  value  is  chosen, 
because  the  solution  method  for  a  model  must  be  selected 
according  to  the  most  complex  aspect  of  the  model.  The 
fcllowing  is  an  example  of  the  definition  of  a  model  type: 

Type  =  central  server  model, 

multiple  class  with  no  class  changes, 
local  balance  disciplines, 
workload  vectors, 
load  independent  servers. 


2^3  Survey  of  Model  Types 

In  the  following  section  the  development  of  queueing 
network  models  will  be  outlined  by  surveying  some  of  the 
most  important  models. 


Closed  Queueing  Systems  with  Exponential  Servers 

Gordon  and  Newell  [ GN  67]  were  the  first  to  consider  a 
general  closed  queueing  network.  Their  model  can  be  viewed 
as  a  special  case  of  Jackson's  more  general  open  models 
[JACK63],  Jackson  showed  that  for  a  wide  class  of  queueing 
networks,  the  steady  state  probability  distribution  of  the 
customers  in  the  network  has  a  product  form:  the  joint 
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probability  distribution  of  the  number  of  customers  in  the 
network  is  the  product  of  the  probability  distributions  of 
the  numbers  of  customers  at  the  servers  of  the  network: 

P(Nl,N2,...,,Nm)  =  P(N1)*P(ll2)^...,*P(Nm) 

where  m  is  the  number  of  servers  in  the  network  and  Ni  is 
the  number  of  customers  at  the  ith  server.  This  product 
form  holds  for  Gordon  and  Newell’s  model  as  well.  It 
suggests  that  the  direct  interdependence  among  the  servers 
is  rather  limited  in  this  class  of  models. 

However,  there  exists  no  closed  form  expression  for  the 
steady  state  probability  distribution  or  the  marginal 
distributions. 


Central  Server  Models 


In  ,his  dissertation,  Buzen  defined  the  class  of  central 
server  models  [B0ZE71],  a  class  of  models  that  is 
particularly  well  suited  for  modelling  computer  systems. 
Also,  he  derived  an  efficient  computational  algorithm  to 
calculate  the  utilizations  and  mean  queue  lengths  of  the 
servers,  and,  with  slightly  more  effort,  the  marginal 
probability  distributions  of  the  number  of  customers  in  the 
network.  The  basic  algorithm  for  networks  with  load 
independent  servers  requires  only  a  mild  extension  to  be 
applicable  for  networks  with  load  dependent  servers  as  well. 


Queueing  ,  Networks  in  Local  Balance 


In  [BCMP75]  the  theoretical  background  is  provided  for 
an  important  class  of  queueing  network  models,  the  models 
satisfying  local  balance .  It  is  shown  that  in  all  queueing 
networks  in  local  balance  the  steady  state  probability 
distributions  have  the  product  form.  A  sufficient  condition 
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for  a  network  to  be  in  local  balance  is  that  all  servers 
have  one  of  the  following  coabinations  of  disciplines  and 
service  tiae  distributions: 

•  The  service  discipline  is  FCFS;  the  service  tiae 
distribution  is  exponential  with  the  same  mean  for  all 
customer  classes, 

•  The  service  discipline  is  a  local  balance  discipline; 
the  customer  classes  may  each  have  any  service  time 
distribution  that  has  a  rational  Laplace  transform. 

These  combinations  can  also  be  described  by  two  "doesn't 
matter"  statements: 

•  If  the  service  times  are  distributed  exponentially 
with  the  same  mean  for  all  customer  classes,  the 
discipline  may  be  any  of  the  previously  mentioned 
except  the  priority  discipline. 

•  If  the  discipline  is  a  local  balance  discipline,  the 
service  time  distribution  may  be  any  distribution  with 
a  rational  Laplace  transform,  and  each  customer  class 
may  have  its  own  distribution. 

The  service  times  may  be  dependent  on  the  state  of  the 
network.  If  classes  of  customers  are  distinguished,  the 
custoaers  of  each  class  are  assumed  to  be  statistically 
identical.  The  customers  can  change  their  class  according 
to  fixed  probabilities  at  each  transition  between  two 
servers.  Unfortunately,  not  included  in  the  class  of 
locally  balanced  queueing  networks  are  networks  that  contain 
servers  that  have  a  priority  discipline  or  FCFS  discipline 
and  a  non-exponential  service  time  distribution. 

Efficient  solution  algorithms  for  locally  balanced 
networks  are  given  in  [CHH  75a,  HONG75,  RK  75],  An  overview 
of  numerical  algorithms  for  the  solution  of  queueing 
networks  can  be  found  in  [RK  761. 
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Hierarchical  Models 


In  [CHH75a],  Chandy,  Herzog,  and  Woo  develop  a  nethod 
fcr  solving  hierarchical  lodels  using  Ilorton*s  theoren  for 
electrical  circuits.  Subsystens  of  queueing  networks  are 
analyzed  separately  fro*  the  rest  of  the  network  using 
closed  Bodels.  They  are  evaluated  under  all  potential 
workload  situations,  i.e.,  assuaing  all  possible  numbers  of 
customers  in  the  subsystem.  Then,  in  the  overall  network, 
the  subnetwork  is  replaced  by  a  composite  server  which  has 
as  its  load  dependent  service  rates  the  throughput  rates  of 
the  submodel  under  the  corresponding  number  of  customers  in 
the  subsystem. 

This  method  is  exact  for  local  balance  models,  but  is 
only  an  approximation  for  models  that  do  not  satisfy  local 
balance  [CHW75b].  Wore  general  conditions  under  which 
networks  can  be  decomposed  into  subnetworks  are  given  by 
Courtois  [C0DB72]. 


Pr iority  Scheduling 


Virtually  all  scheduling  and  dispatching  algorithms  in 
actual  computer  systems  contain  priority  disciplines,  so  it 
is  desirable  to  be  able  to  represent  them  in  queueing 
models.  As  local  balance  does  not  hold  in  models  including 
priority  disciplines,  they,  can  be  solved  exactly  only  by 
global  balance  methods  or  related  techniques  whose  costs  are 
prohibitive  in  most  realistic  cases.  However,  there  are 
several  approximation  methods  available  which  are  based  on 
locally  balanced  models. 

Sauer  and  Chandy  [SC  75]  reduce  the  queueing  networks 
using  Norton’s  theorem.  Starting  out  with  multiple  class 
central  server  networks,  the  different  customer  classes  are 
coalesced  to  three  classes:  the  interesting  class  and  two 
composite  classes  which  contain  respectively  all  classes  of 
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higher  priority  and  all  classes  of  lower  priority.  Then  the 
peripheral  serwers  are  reduced  to  one  coaposite  server.  The 
resulting  two  server,  three  class  cyclic  model  is  small 
enough  to  be  solved  by  a  global  balance  technique,  which  can 
accomacdate  priority  disciplines.  This  algorithm  is 
performed  for  each  of  the  customer  classes  in  turn. 

In  a  recent  paper,  Sevcik  [SEVC77a]  suggests  a  series  of 
approximations  for  models  including  priority  disciplines. 
The  upper  and  lower  bounds  provided  by  these  approximations 
are  frequently  quite  tight.  The  most  interesting  method 
(and,  as  it  turns  out,  also  the  best  approximation)  uses  a 
•shadow  CPU*  to  represent  preemptive  priority  scheduling  at 
the  CPU.  There  are  two  CPUs  in  the  model:  the  ••regular”  CPU 
for  the  high  priority  jobs  and  the  shadow  CPU  for  the  low 
priority  jobs.  The  capacity  of  the  shadow  CPU  is  equal  to 
the  capacity  of  the  regular  CPU  times  one  minus  its 
utilization.  This  is  expressed  in  the  model  by  extending 
the  CPU  mean  service  times  of  the  low  priority  jobs.  As 
both  classes  compete  on  an  equal  basis  for  the  other 
servers,  they  are  interacting  heavily,  and  the  utilization 
of  the  high  priority  CPU  is  dependent  on  the  throughput  of 
the  low  priority  jobs.  Thus,  the  ’correct*  extension  factor 
has  to  be  determined  by  an  iterative  algorithm  that,  in 
general,  converges  very  fast.  Reiser  [ 8EIS76  ]  proposes  a 
similar  technique. 


General  Service  Time  Distribution  and  FCFS  Discipline 

In  computer  systems  the  combination  of  FCFS  service 
disciplines  and  non-exponential  service  time  distributions 
at  a  service  center  occurs  quite  frequently.  Unfortunately, 
such  service  centers  cannot  be  represented  in  locally 
balanced  models.  There  are,  however,  approximations 
available  for  these  cases, 

Chandy,  Herzog,  and  Woo  [CHW75b]  have  devised  an 
iterative  algorithm  based  on  Norton^s  theorem  reductions. 
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Assujaing  that  the  network  is  in  local  balance,  a  sequence  of 
reductions  is  performed  in  which  each  server,  in  turn,  is 
represented  by  a  single  server,  and  all  the  other  servers 
are  reduced  to  one  composite  complementary  server.  As  local 
balance  does  not  hold,  but  is  assumed,  the  throughput  values 
of  the  servers  are  inconsistent.  To  achieve  consistency, 
the  service  times  are  adjusted  according  to  an  algorithm, 
and  the  sequence  of  reductions  is  performed  again.  This 
procedure  is  repeated  until  satisfactory  consistency  of  the 
throughput  of  the  servers  is  achieved, 

Sauer  and  Chandy  [SC  75]  apply  a  series  cf  Norton's 
theorem  reductions  to  central  server  networks.  First,  they 
reduce  the  system  to  a  two  class  system  by  coalescing  all 
classes  except  the  interesting  class.  Then  they  assume  that 
the  system  is  in  local  balance  and  reduce  all  the  peripheral 
servers  into  one  composite  I/O  server  and  estimate  its 
coefficient  of  variation  using  a  simple  formula.  The 
resulting  two  class,  two  server  system  is  small  enough  to  be 
solved  ,by  a  global  balance  technique.  This  procedure  is 
applied  in  turn  to  all  customer  classes.  This  technique  can 
also  be  used  to  analyze  subnetworks  of  general  networks  if 
they  have  only  one  input  and  one  output  stream, 

Sevcik,  Levy,  Tripathi,  and  Zahorjan  [SLTZ77]  isolated 
the  three  major  error  sources  in  applying  Norton's  theorem 
to  networks  not  in  local  balance: 

•  The  assumption  of  exponential  service  times  while 

analyzing  the  subsystem  causes  errors  in  the 

throughput  calculation  for  the  subsystem, 

•  The  assumption  that  the  composite  server  has 

exponential  service  time  distributions  is  an  error 
source  in  the  analysis  of  the  reduced  system, 

•  The  assumption,  made  in  the  analysis  of  the  subsystem, 
that  the  input  process  to  the  subnetwork  is  identical 
to  its  output  process  introduces  an  error. 

Based  on  considerations  of  transition  processes 

[SEVC77b]  and  on  simulations,  formulas  for  the  estimation  of 
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the  coefficients  of  variation  of  the  service  tines  in  the 
reduced  networks  are  developed.  These  fornulas  reduce  the 
errors  due  to  the  second  and  the  third  assumptions.  To 
decrease  the  error  due  to  the  first  assumption,  an  improved 
stepwise  reduction  technique  is  proposed  [ ZAH077  and 
GRAH77]):  In  each  step  only  a  small  number  of  servers  are 
aggregated,  and  the  aggregation  is  done  by  a  global  balance 
technigue.  Applying  all  the  proposed  improvements  together 
results  in  remarkably  better  approximations. 


Diffusion  Approximation 


Traditionally  queueing  problems  are  treated  as 
continuous-time,  discrete-state  Markov  processes.  This  view 
leads  to  the  well  known  large  systems  of  linear  equations 
for  the  state  probabilities  which  are,  in  most  realistic 
cases,  too  large  to  be  solved  efficiently.  Another  approach 
to  the  problem  is  to  treat  the  number  of  customers  in  the 
system  as  a  continuous  random  variable.  This  leads  to 
continuous-time,  continuous-state  Markov  processes. 

A  first  order  approximation  to  such  a  process  is  the 
fluid  approximation  [ KLEI76  ]  which  takes  into  account  only 
the  mean  rates  of  the  arrival  and  the  departure  processes. 
It  leads  to  a  first  order  differential  equation  for  the 
distribution  of  the  number  of  customers  in  the  system. 
However,  it  does  not  reflect  the  variability  of  the 
stochastic  processes  and  thus  gives  an  optimistic  view  of 
congestion  and  waiting  times. 

The  diffusion  approximation  [ KOBA74a,  KOEA74b,  GELE75, 
KLEI76,  GP  76],  is  a  second  order  approximation  in  that  it 
reflects  the  variance  of  the  arrival  and  the  departure 
processes.  It  leads  to  a  second  order  differential  equation 
for  the  probability  distribution  of  the  number  of  customers 
in  the  system  (the  Fokker-Pla nek  equation)  that  corresponds 
directly  to  the  Chapman-Kolmogorov  equation  for  discrete- 
state  Markov  processes.  The  diffusion  approximation  assumes 
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heavy  load  in  the  system  as  it  cannot  represent  idle  periods 
accurately.  It  is  not  only  applicable  for  single  server 
systems  but  also  for  queueing  networks,  and  it  does  not  need 
the  assumptions  required  for  locally  balanced  systems.  It 
can  be  applied  to  the  important  case  of  general  networks, 
general  service  time  distributions,  and  FCFS  disciplines. 
However,  presently  it  is  restricted  to  one  class  of 
customers  and  load  independent  servers. 


Important  Points  in  the  Model  Space 


In  table  1  it  is  shown  how  the  models  surveyed  above  and 
seme  other  important  models  fit  into  the  framework  defined 
earlier.  It  should  be  kept  in  mind  that  this  classification 
of  the  models  is  just  according  to  abstract  model 
properties.  It  does  not  include  any  aspect  of  their 
application  to  computer  systems.  The  solution  methods  are 
not  a  classification  criterion,  although  they  may  be 
mentioned  in  the  comments  to  emphasize  the  practical 
importance  of  a  particular  model. 


Abbreviations  of  table  1: 
cch:  changing  classes 

Cox  :  representation  by  the  method  of  stages  by  Cox 

exp.  mean:  exponential  distribution,  represented  by  the  mean 

gen.  nw,  :  general  network 

gen.distr.  :  general  distribution 

hierarch.  :  hierarchical  model 

hyper.  :  hyperexponential  distribution 

hypo.  :  hy poexponential  distribution 

l.b.  :  local  balance  disepline  and  distribution  combination 

l.d.  :  load  dependent  server  characteristic 

l.i.  :  load  independent  server  characteristic 

prio.  : priority  discipline 

w.l. V.  :  workload  vectors 


Table  1:  Model  classification  vectors 
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2,4  Modelling  Computer  Systems 


The  models  outlined  in  the  previous  sections  can  be  used 
to  analyze  a  wide  range  of  computer  systems.  The  activities 
of  a  system  which  are  to  be  modelled  must  be  selected 
according  to  the  performance  measures  of  interest. 
Corresponding  to  the  four  major  time  scales  of  events  in 
computer  systems,  four  levels  of  system  activities  and 
ccnseguently  four  modelling  levels  can  be  distinguished: 

•  the  microprogram  level 

•  the  instruction  level 

•  the  operating  system  level 

•  the  user  interface  level 


In  most  modelling  studies  the  selection  of  the  system 
components  and  activities  to  be  modelled  is  made  more  or 
less  in  accordance  with  this  level  structure.  The  levels  of 
hierarchical  models  usually  correspond  to  these  modelling 
levels  as  well.  As  Courtois  points  out  [COUR72],  models  are 
nearly  decomposable  into  subsystems  if  the  time  scale  of  the 
events  within  the  submodel  is  smaller  by  an  order  of 
magnitude  than  the  time  scale  of  interactions  between  the 
submodel  and  the  rest  of  the  model.  In  the  following,  we 
lock  at  the  four  modelling  levels  in  more  detail. 


Microprogram  Level 


The  microprogram  level  is  the  lowest  level.  It  contains 
all  activities  of  shorter  duration  than  single  CPU 
instructions.  Its  time  scale  is  small  enough  to  cover 
microinstruction  cycles,  cache  memory  cycles,  and  core 
memory  cycles.  The  components  of  a  computer  that  can  be 
modelled  at  the  microprogram  level  include  the  arithmetic 
and  logical  unit  (ALU)  ,  the  microprogram  storage,  main 
memory  modules,  high  speed  cache  memory,  and  I/C  processors 
accessing  the  main  memory.  Models  on  the  microprogram  level 
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can,  for  exaaple,  be  used  to  study  how  high  speed  cache 
memory,  cycle  stealing,  interleaving  of  memory  blocks,  or 
pipelining  affect  the  instruction  rate  of  the  CPU.  They  can 
also  be  applied  to  other  parts  of  computers  using 
microprogrammed  logic.  There  are  very  few  studies  so  far 
that  include  modelling  on  the  microprogram  level  [  BS  76]. 


Instruction  Level 


The  instruction  level  is  above  the  microprogram  level. 
Its  time  scale  is  on  the  order  of  instruction  execution 
times.  With  respect  to  the  CPU,  this  level  is  not 
particularly  interesting  for  analytic  modelling.  However, 
it  is  very  important  with  respect  to  the  I/O  subsystems  of 
computers.  The  components  of  the  I/O  subsystems  that  are 
generally  modelled  on  this  level  are  I/O  processors  or  I/O 
channels,  I/O  devices,  control  units,  peripheral  processors, 
etc. ,  depending  on  the  architecture  of  the  computer  system 
that  is  to  be  modelled.  The  problems  that  can  be 
investigated  on  this  level  are,  for  example,  the  binding  of 
physical  channels  to  logical  channels,  the  effects  of  disk 
scheduling  algorithms,  the  load  dependent  behaviour  of 
direct  access  devices,  and  the  interaction  and  the  amount  of 
parallelism  among  the  components  of  the  I/O  subsystems. 
Some  relative  models  of  I/O  systems  have  been  designed,  e.g. 
[CD  73,  ZAfl076,  SILH77].  However,  due  to  limitations  of  the 
measurement  tools  for  these  activities,  they  are  rarely 
included  in  models  using  the  absolute  approach  [BCBKTD75]. 


Operating  Systems  Level 


The  operating  system  level  is  probably  the  most 
freguently  modelled  level  of  computer  systems.  Its  time 
scale  covers  CPU  instruction  bursts  between  interrupts  or 
dispatcher  runs,  and  entire  I/O  operations.  Two  major 
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approaches  to  modelling  on  the  operating  systems  level  can 
b€  found: 

The  physical  approach  is  limited  to  the  hardware  aspects 
of  computer  systems,  looking  only  at  busy  times  and  numbers 
and  length  of  activity  bursts  of  hardware  components  such  as 
the  CPU  and  I/O  channels.  It  does  not,  for  example, 
distinguish  between  different  types  of  I/O  operations  such 
as  file  I/Cs,  page  faults,  swapping  I/Os,  etc.  or  between 
problem  computation  and  system  overhead  [CHIU71,  M00R71, 
BH  74,  GIAH76,  ROSE76  ]. 

The  logical  approach  includes  logical  information  about 
the  operating  system.  Swapping  of  address  spaces  [CHEN75, 
BBC77],  paging  behaviour  of  programs  as  a  reaction  to 
restricted  main  memory  size  [  DG  75,  WB  76],  and  interactive 
terminals  [fl00R71,  SU  75,  H0NT75  ]  can  be  represented  in  the 
models,  as  well  as  set-up  stages  for  operator  actions 
[ BW  74,  GIAH76],  or  passive  resources  like  I/O  processors, 
non-shared  data  sets,  and  non-reentrant  program  code 
[KELL77]. 

A  third  approach  on  the  operating  system  level  was  taken 
by  Boyse  [B0YS71].  He  modelled  a  computer  system  as  an 
abstract  machine^  just  as  a  single  class  of  programs  sees 
it. 

Models  on  the  operating  system  level  are  mainly  used  to 
locate  bottlenecks  in  the  system  and  to  predict  changes  in 
performance  measures  of  systems  due  to  changes  in  the 
configuration,  the  operating  system,  or  the  workload  of  the 
system. 


User  Interface  level 

The  user  interface  level  is  the  top  most  level.  It 
covers  entire  user  interactions  from  terminals  or  batch 
jobs.  Its  time  scale  is  in  the  order  of  the  system  response 
tine. 
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Some  early  studies  modelled  the  computer  systems  only  on 
the  user  interface  level  [SCHE67,  LS  72].  In  other  models, 
it  is  used  as  the  top  level  of  hierarchical  models  [MW  74, 
BEC77,  TS  77,  NEIL76,  KSS  77],  Modelling  on  this  level  can 
represent  interactive  terminals  and  limited  multiprogramming 
levels  due  to  a  limited  number  of  regions  or  limited  main 
memory.  It  may  also  deal  with  the  delay  of  batch  programs 
on  the  printers  (spooling  devices  are  usually  not  modelled 
on  the  operating  systems  level) .  The  user  interface  models 
are  mostly  used  to  determine  the  fraction  of  time  the  system 
spends  at  various  multiprogramming  levels.  These  fractions 
are  then  used  in  models  on  the  operating  systems  level,  and 
to  calculate  response  times. 


2,5  Survey  of  Case  studies 

In  this  section  some  case  studies  of  models  using  the 
absolute  approach  are  described.  We  chose  the  case  studies 
listed  here  because  they  had  good  discussions  about 
estimating  input  parameters.  In  order  to  keep  the  survey 
concise,  the  general  part  of  the  description,  which  applies 
to  all  case  studies,  is  organized  in  table  form. 


«  22  1  §_  Mo  d  e  1_LM  0  0|21J 

Model  Type  =  central  server  model 

single  class 
ICFsPB  discipline 
exponential  distributions 
load  independent  servers 

Modelling  Level:  operating  systems  level 

physical  approach  and  interactive  terminals 

Monitor  used:  special  event  trace  monitor 

Input  Data:  mean  service  times  . 

transition  probabilities 
multiprogramming  level 

Solution  Method:  global  balance  technigue 
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Hesults;  waiting  tines,  mean  gueue  lengths,  and 

utilizations  per  device,  cycle  times, 
response  tines 

Because  Moore  found  that  the  parameter  estimation  was  a 
critical  point  in  modelling,  he  used  upper  and  lower  bounds 
of  the  input  parameters  to  calculate  envelopes  for  the 
performance  measures.  By  varying  the  number  of  bursts  and 
the  burst  lengths,  he  ^realized  that  these  did  not  affect  the 
model  very  much  as  long  as  the  total  amount  of  work  for  the 
servers  did  not  change.  The  distribution  of  the  actual  CPU 
bursts  was  reported  to  be  exponential  due  to  the  LCFSPR 
discipline,  whereas  the  distribution  of  the  CPU  mean  time 
between  two  I/O  requests  of  a  program  was  hyperexponential. 
This  finding  is  compatible  with  the  theoretical  derivation 
in  [BCMP75]  that  LCFSPR  is  a  Poisson  stream  preserving 
discipline. 

Moore  investigated  a  series  of  changes  to  the  original 
configuration  of  the  computer  system  in  order  to  find  an 
optimally  balanced  configuration. 


BoYse*s  Model  FBOYSTI] 


Model  Type  =  cyclic  model 

single  class 
LCFSPR  discipline 
exponential 

load  independent  servers 

Modelling  Level;  operating  systems  level;  abstract  machine 

model  for  program  behaviour,  including  paging 

Monitor  used:  special  event  trace  monitor 

Input  Data:  mean  service  times 

completion  probabilities 
multiprogramming  level 

Solution  Method:  global  balance  technique 

Results:  throughput 


The  distribution  of  the  mean  CPU  time  between  I/O 
requests  was  found  to  be  hyperexponential,  but,  because  the 
assumption  of  exponential  distributions  did  not  affect  the 
results  very  much,  exponential  distributions  were  used  in 
the  model.  The  distribution  of  the  number  of  cycles  of  jobs 
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through  the  system  before  completion  was  measured  to  be 
geometric.  The  part  of  the  workload  that  was  modelled 
consisted  only  of  very  narrowly  defined  job  classes,  such  as 
FORTRAN  compilations.  For  the  CPU  capacity  only  a  fraction 
of  the  actual  capacity  of  the  CPU  was  used  to  account  for 
overhead  and  for  the  presence  of  jobs  not  belonging  to  the 
very  narrowly  defined  job  class  of  interest.  Boyse 
investigated,  among  other  issues,  the  advantage  of  shared 
program  code. 


Chiuls_^d€l_LCHIU73J 


Model  Type  =  cyclic  model 

single  class 
FCFS  discipline 
exponential 

load  independent  servers 


Modelling  Level:  operating  systems  level,  hardware 

approach 

Monitors  used:  hardware  monitor,  accounting  monitor 

(SMF) 

Input  data:  mean  service  times 

completion  probabilities 
multiprogramming  level 

Solution  Method:  special  solver  for  cyclic  models 

Results:  utilizations  of  servers,  throughput, 

elapsed  time  for  benchmark  experiment 


Chiu  took  measurements  of  the  system  under  normal  and 
under  controlled  conditions.  In  the  controlled  environment 
he  changed  the  multiprogramming  level  by  varying  the  number 
of  initiators. 


He  investigated  the  effects  of  changing  the  scheduling 
strategy  and  of  replacing  main  memory  by  faster  memory.  The 
cyclic  model  was  reported  to  be  more  accurate  than  a  central 
server  model. 
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The  Models  of  Bhandiwad  and  Williams 

Model  with  Set-Op  Stage  FEW  741 


Model  Type  =  general  network 

single  class 
FCFS  discipline 
exponential 

load  independent  servers 

Modelling  Level:  operating  systenis  level,  physical 

approach,  set-up  stage 

Monitors  used:  hardware  monitor,  accounting  monitor, 

(SMF) 


Input  data:  relative  arrival  rates 

mean  service  times 
multiprogramming  level 

Solution  Method:  Euzen*s  algorithm 

Resul ts: 


utilizations  of  servers,  elapsed  time 
for  benchmark 


Bhandiwad  and  Williams  indicated  that  overhead  is  a 
function  of  the  multiprogramming  level.  They  incorporated  a 
facility  into  their  model  to  specify  this  function.  They 
did  not  report,  however,  whether  they  used  this  feature-  of 
their  model.  Using  benchmarks,  they  took  the  number  of 
partitions  as  multiprogramming  level  of  their  model.  The 
actual  multiprogramming  level  of  the  system  was  quite 
stable,  if  all  jobs  of  the  benchmark  were  read  into  the 
system  at  the  beginning  of  the  experiment.  At  the  end  of 
the  execution  of  the  benchmark,  some  jobstreams  were 
finished  while  others  still  contained  jobs.  These  end 
effects  introduced  slight  errors.  Bhandiwad  and  Williams 
did  not  report  how  they  specified  the  service  time  of  the 
set-up  stage.  As  a  general  result,  they  claim  that  the 
errors  in  response  times  and  throughput  due  to  errors  of  a 
factor  in  the  input  data  are  smaller  than  the  utilizations 
times  this  factor.  An  interesting  point  in  their  study  was 
that,  in  the  validation,  they  "tuned”  the  input  parameters 
so  that  the  resulting  utilizations  matched  the  measured 
utilizations.  Only  then,  they  introduced  the  changes  into 
the  system  which  they  wanted  to  investigate.  If  the 
proposed  system  changes  involve  changes  in  "tuned" 
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parameters^  the  resulting  performance  predictions  have  to  be 
interpreted  cautiously. 

The  model  was  employed  to  compare  different  operating 
systems  and  to  evaluate  the  effect  of  increased  main  memory 
on  throughput. 


The  Virtual  Memory  Model  f WB  761 


Model  Type  =  hierarchical  model 

a)  central  server  model 
3  classes 

FCFS  discipline 
exponential 

load  independent  servers 

b)  machine  repairman  model 
3  class 

priority  discipline 

exponential 

load  dependent  server 

Modelling  levels:  a)  operating  systems  level, 

logical  approach,  paging 

b)  user  interface  level 

Monitors  used:  not  specified  in  the  paper 

Input  Data:  relative  utilizations  (mean  service 

times  and  burst  rates) 
multiprogramming  level 

Solution  method:  algorithms  by  Sauer  and  Chandy, 

machine  repairman  formulas 

Results:  paging  rates  total,  throughput  by  class, 

time  sharing  response  times 


Williams  and  Bhandiwad  developed  a  theoretical  model  for 
virtual  memory.  They  approximated  the  lifetime  functions  in 
the  vicinity  of  the  measured  point  by  a  linear  function. 
They  showed  how  to  include  CPU  overhead  for  paging  in  the 
model  using  a  linear  page  fault  rate  function,  but  they  did 
not  specify  how  to  determine  this  function.  As  well,  they 
suggested  modelling  the  paging  dish  by  a  load  dependent 
server,  but  they  did  not  explain  how  to  do  it.  It  is  not 
clear  how  they  derived  the  think  times  and  the  mean  number 
of  active  terminals  for  their  model. 


They  validated  their  model  with  measurement  data  from 
benchmark  runs  and  then  proceeded  to  predict  the  performance 
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of  the  systeia  under  a  tiae  sharing  load  which  was  four  times 
as  large  as  the  original  time  sharing  load* 


Sevcik  and  Dnruh*s  Hodel  T SU  751 


Model  Type  =  hierarchical  model 

a)  central  server  model 
2  classes 

priority  discipline 
exponential 

load  independent  servers 

b)  machine  repairman  model 
single  class 

FCFs  discipline 
exponential 

load  dependent  servers 

Modelling  levels:  a)  operating  systems  level, 

physical  approach 
b)  user  interface  level 

Monitors  used:  CUE-  TSTRACE,  accounting  monitor 

(PACE) 

Input  Data:  transition  probabilities 

mean  service  times 
multiprogramming  level 

Solution  Method:  Buzen’s  algorithm,  machine  repairman 

formulas 

Results:  response  times  for  time  sharing  jobs, 

throughput  for  batch  jobs 


Sevcik  and  Unruh  modelled  the  priority  of  the  time 
sharing  jobs  at  the  CPU  by  allowing  the  batch  jobs  to 
interfere  with  them  only  at  the  I/O  devices.  This 
interference  was  represented  by  extending  the  I/O  service 
times.  An  optimistic  and  a  pessimistic  bound  on  the 
performance  measures  were  derived.  The  measurements  showed 
that  the  system  had  a  "large  user”  problem:  one  particular 
user  could  influence  the  overall  performance  of  the  system. 
In  such  a  situation  special  care  must  be  taken  in  the 
definition  of  the  workload  for  the  model. 


The  model  was  used  to  systematically  evaluate  a  number 
of  proposed  changes  to  the  system:  increasing  the  number  of 
regions,  adding  another  disk  channel,  introducing  a  faster 
main  memory,  adding  main  memory,  and  changing  parameters  of 
the  operating  system. 
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G iamno«s  Model  rGIA»76a1 


Model  Type  =  general  network 

single  class 

local  balance  disciplines 

workload  vectors 

load  independent  servers 

Modelling  Level;  operating  systems  level,  physical 

approach,  set-up  stage 

Monitors  used:  not  specified  in  the  paper 

Input  data:  total  busy  time  of  servers 

number  of  jobs  completed 
active  set-up  time 

Solution  Method:  Euzen's  algorithm 

Results:  elapsed  time  of  benchmarks,  throughput 


Giammo  was  the  first  to  apply  workload  vectors  in  a  case 
study.  He  found  that  a  small  amount  of  detail  (using  only 
one  class  of  jobs,  representing  only  channels,  etc.)  seemed 
to  be  sufficient  for  his  study.  However,  he  did  not  report 
on  the  more  sensitive  performance  measures  like  queue 
lengths  and  response  times,  whose  accurate  prediction  may 
require  more  detail  in  the  model.  He  suggested  a  load 
dependent  server  for  the  set-up  stage,  but  in  his  model  he 
used  a  load  independent  server.  The  set-up  stage  was 
reported  to  be  the  most  critical  part  of  the  model. 


The  model  was  used  to  validate  the  modelling  assumptions 
and  the  parameter  specification  technique. 


Rose*  s  Model  f  ROSE76  1 


Model  Type  =  central  server  model 

2  classes 

local  balance  disciplines 
exponential 

load  independent  servers 

Modelling  Level:  operating  systems  level,  physical 

approach 

Monitors  used:  hardware  monitor,  TSTRACE,  accounting 

monitor  (SMF) 
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Input  Data:  transitiop  pro^jabili ties 

nean  service  tiaes 
aultiprograiaing  level 

Solution  Method:  local  balance  solver 

Results:  utilzaticns  of  the  servers 

Rose  explained  extensively  how  he  estimated  the  input 
parameters  of  the  models.  He  calibrated  his  model  by 
adjusting  the  I/O  service  times  until  the  CPU  utilization  of 
the  model  matched  the  measured  CPU  utilization.  This  seems 
to  be  his  method  to  account  for  the  load  dependent  service 
rates  and  the  parallelism  in  the  I/O  subsystem.  This  may  be 
a  valid  method  for  predictions  if  the  calibration  factor  can 
be  determined  by  measurements  on  a  system  similar  to  the 
proposed  system.  But  if  this  is  not  possible,  it  is  very 
difficult  to  predict  the  calibration  factor. 

Rose  used  his  model  mainly  to  validate  his  modelling 
assumptions.  He  also  reported  the  evaluation  of  some 
changes  in  the  modelled  computer  systems. 


The  Model  of  Krzesinski,  Gerber,  and  Teunissen  PKGT  771 


Model  Type  =  general  network  model 

i3  classes,  class  changes 
local  balance  disciplines 
Cox  type  distributions 
load  dependent  service  rates 

Modelling  Level:  operating  system  level  for  the  CPU  and 

instruction  level  for  the  I/O  subsystem, 
physical  approach 

Monitor  used:  event  trace  software  monitor 

Input  Data:  service  time  distributions  (Cox  type) 

transition  probabilities 
multiprogramming  level 

Solution  Method:  global  balance  equations  reduced  by 

exploiting  local  balance  properties  of  the 
system;  final  calculation  recursive  like 
Buzen’s  algorithm 

Results:  utilizations,  queue  lengths,  times-in-system 


An  integrated  performance  evaluation  system  was 
developed,  A  software  event  trace  monitor  that  has  been 
modified  for  this  purpose  produced  all  the  input  data 
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required  for  the  model.  The  monitoring  overhead  for  a 
detailed  event  trace  was  reported  to  be  not  greater  than  5 
percent.  The  solution  system  can  handle  all  variations  of 
local  balance  models  as  described  in  [BCMP75].  It  accepts 
batch  input  or  conversational  input.  The  computational 
algorithm  generates  the  global  balance  equations,  reduces 
them  in  several  steps  to  a  much  smaller  system  of  equations, 
and  finally  calculates  the  normalizing  constant  using  a 
recursive  algorithm. 


A  very  large  model  with  19  servers  and  1 
allowing  for  class  changes,  was  validated 
predict  a  computer  system's  performance  under 
wcrklcad  and  in  the  configuration. 


3  job  classes, 
and  used  to 
changes  in  the 


2,6  Review  of  Parameter  Estimation  Methods 


An  important  step  in  establishing  a  queueing  network 
model  is  parameter  estimation,  the  determination  of  the 
input  parameters  from  the  measurement  data.  Before  we 
review  the  parameter  estimation  methods  applied  in  some  case 
studies,  we  comment  on  the  parameter  requirements. 

Locally  balanced  models  require  as  input  parameters  only 
the  multiprogramming  level  and  the  relative  utilizations  of 
the  servers.  For  multiple  class  models,  these  parameters 
must  be  broken  down  by  class.  As  a  consequence,  Kobayashi 
and  Reiser  [  KR  75]  and  Giammo  [GIAM76b]  point  out  that  it  is 
possible  tc  specify  the  relative  utilization  of  a  server  as 
the  total  number  of  operations  requested  by  a  task  divided 
by  the  service  rate.  Thus  all  dynamic  aspects  of  a 
program's  behaviour,  such  as  transitions  from  one  server  to 
another  and  the  lengths  of  the  service  bursts  on  the 
servers,  can  be  ignored.  The  workload  vector  is  sufficient 
tc  calculate  the  relative  utilizations.  As  will  be  shown 
later,  there  are  other  equivalent  ways  to  specify  a 
consistent  set  of  relative  utilizations.  The  authors  of 
most  of  the  papers  we  have  reviewed,  however,  did  not 
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recognize 
disa  ggrega 
tiroes  and 
norma  1 ize 
essential 
technical 


this  easy  method  of  parameter  specification  and 
ted  the  measurement  data  to  calculate  mean  service 
transition  probabilities,  just  to  aggregate  and 
them  afterwards.  The  normalization  is  not  an 
part  of  the  algorithms,  but  is  done  only  for 
reasons  (to  maintain  numerical  stability). 


CPU  Service  Requirements 

The  basic  formula  used  for  deriving  the  mean  CPU  service 
times  is: 

U  ♦  T 

M  = - 

IC 

where  U  is  the  CPU  utilization,  T  is  the  measurement  period, 
and  10  is  the  total  number  of  CPU  bursts,  usually  assumed  to 
be  equal  to  the  total  number  of  I/O  requests.  For  single 
class  models  this  formula  is  sufficient  to  determine  the 
mean  CPU  service  time  [BW  74].  It  automatically  includes  the 
CPU  overhead.  It  should  be  noted  that  the  service  time  is 
actually  a  consequence  of  two  quantities,  the  service 
request  and  the  server  capacity.  Few  authors  malce  this 
distinction  [ DG  75,  KR  75].  Moore  [M00R71]  determined  the 
mean  CPU  service  times  from  event  traces  of  the  modelled 
system.  He  distinguished  between  two  definitions  of  bursts. 
In  the  following,  the  time  between  two  successive  I/O 
requests  is  called  a  logical  burst,  and  the  time  a  task 
spends  without  interruption  at  the  CPU  is  called  a  physical 
burst.  For  his  model  Moore  used  the  physical  bursts.  In 
local  balance  models  the  burst  definition  is  irrelevant. 
Only  the  total  number  of  operations  a  task  requires  and  the 
CPU  service  rate  are  necessary  as  input  for  the  solution 
algorithm.  Giammo  [GIAM76a]  uses  the  total  busy  times  as 
recorded  by  the  monitor  as  the  relative  utilizations  for  his 
models  . 


These  methods  are 
locally  balanced  models. 
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all  mathematically  equivalent  for 
but  certainly  Giammo's  method  is 
the  simplest.  However,  in  a  multiple  class  context,  it  may 
not  always  be  easy  to  estimate  the  total  busy  times  of  the 
servers  by  class  for  a  proposed  change  in  the  system. 


If  paging  is  introduced  into  the  model,  additional  CPU 
overhead  due  to  paging  can  be  represented.  Williams  and 
Bhandiwad  [WB  76]  suggested  extending  the  CPU  mean  service 
times  linearly  with  the  paging  rate.  They  did  not  report  how 
the  extension  factor  can  be  found. 


In  multiple  class  models  the  distribution  of  the 
overhead  among  the  classes  is  a  difficult  problem,  because 
the  accounting  monitors  that  are  generally  used  to  obtain 
the  break  down  of  the  measurements  by  class  (such  as  SMF 
[IB!l76a])  do  not  record  all  CPU  time.  Some  CPU  time  is 
unaccounted,  and  the  total  CPU  busy  time  can,  in  general, 
only  be  recorded  by  hardware  monitors  (see  chapter  3).  The 
unaccounted  CPU  time  has  to  be  spread  over  the  customer 
classes.  Rose  [FOSE76]  suggested  applying  the  following 
formula  to  calculate  the  mean  CPU  service  times  of  class  i: 


M(i)  = 


0  ♦  T  *  f  (i) 
IO(i) 


where  U  is  the  total  CPU  utilizat 
period,  f  (i)  is  the  relative  fracti 
class  i,  and  IO(i)  is  the  number  of 
For  the  determination  of  10  (i),  aga 
arises,  as  normally  not  all  I/O 
accounting  monitors.  To  deterain 
using  the  relative  fraction  of 
active  at  a  time.  This  would  sp 
equitably  among  all  customer  clas 
to  use  the  relative  fraction  of  the 
the  classes.  This  implies  the  assu 
unaccounted  CPU  time  is  proportiona 
time,  which  certainly  need  not  be 
unaccounted  CPU  time  is  distrib 
according  to  the  number  of  I/O 


ion,  T  is  the  measurement 
cn  for  the  CPU  time  of 
I/O  requests  of  class  i. 
in  a  distribution  problem 
requests  are  recorded  by 
e  f(i).  Rose  suggested 
the  number  of  customers 
read  out  the  CPU  time 
ses.  Another  proposal  is 
accounted  CPU  time  of 
mption  that  the  amount  of 
1  to  the  accounted  CPU 
the  case.  In  [KSS77]  the 
uted  among  the  classes 
requests  because  it  is 
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assuned  that  most  CPU  overhead  is  connected  with  I/O 
requests. 

For  models  not  satisfying  local  balance,  usually  the 
first  two  moments  of  the  probability  distributions  of  the 
service  times  have  to  be  specified  [CHR75b,  SITZ77,  SC  75], 
In  these  cases  the  results  may  be  sensitive  to  how  "bursts” 
are  defined.  However,  there  are  no  published  models  using 
the  absolute  approach  that  involve  parameter  estimation  for 
models  not  satisfying  local  balance. 


Transition  Probabilities 

For  cyclic  models  [B0YS71,  CHIU71  ],  only  completion 
probabilities  have  to  be  specified.  Locally  balanced 
network  models  do  not  require  transition  probabilities  at 
all,  if  the  relative  utilizations  of  their  servers  are 
calculated  via  the  workload  vectors  [  KR  75,  GIAfl77]. 
However,  most  of  the  authors  calculate  the  transition 
probabilities  from  the  relative  access  frequencies  of  the 
modelled  devices  [M00R71,  BW  74,  SU  75].  Moore  suggested 
deriving  them  from  device  busy  times,  service  rates,  and 
block  sizes.  This  is  a  very  questionable  method,  as  it 
cannot  take  into  account  the  overlapped  activities  in  the 
I/O  subsystem. 

If  paging  is  modelled,  the  transition  probabilities  must 
reflect  the  paging  I/O  requests  as  well,  whose  rate  can  be 
determined  from  the  life-time  functions  of  the  programs. 
Euzen  [BUZE71]  gave  formulas  for  a  synthetic  life-time 
function  and  showed  how  to  calculate  the  parameters  for  the 
central  server  model  from  them.  Denning  and  Graham  [ DG  75] 
gave  formulas  for  paging  I/O  requests  that  can  use  measured 
life-time  functions.  Williams  and  Bhandiwad  [ HB  76]  avoided 
calculating  the  transition  probabilities  explicitly  but 
derive  the  relative  utilizations  directly  from  measured 
system  parameters. 
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For  multiple  class  models,  the  problem  of  the 
ucacccunted  I/O  requests  arises  again.  Bose  [ROSE76] 
distributes  the  total  number  of  the  I/O  requests  recorded  by 
hardware  monitor  according  to  the  ratios  of  the  accounted 
I/O  requests.  This  again  implies  the  assumption  that  the  job 
classes  cause  unaccounted  I/O  requests  in  proportion  to 
their  accounted  I/O  requests. 


I/O  Service  Times 


representation  of  the  I/O 
If  the  channels  and  the 
if  only  the  channels  are 


To  determine  the  correct 
subsystem  is  a  difficult  task, 
devices  are  coalesced,  or 
represented  (as  it  is  typically  done),  the  I/O  server  should 
be  a  composite  server  with  load  dependent  characteristics. 
As  the  measurements  often  do  not  provide  enough  details  for 
such  a  representation,  the  simpler  methods  outlined  below 
aie  used.  The  formula  most  frequently  used  to  derive  the 
mean  I/O  service  times  for  queueing  models  is  similar  to  the 
formula  shown  for  the  mean  CPU  service  times; 


M(i) 


U (i)  *  T 

IC(i) 


where  M(i)  is  the  mean  service  time  of  server  i,  U(i)  is  its 
utilization,  IO(i)  is  the  number  of  I/O  requests  to  this 
server,  and  T  is  the  measurement  period.  This  method  has 
several  shcrtcomings: 

•  It  models  the  pattern  of  departures  (interdeparture 
time  distribution)  rather  than  the  pattern  of  the 
service  (service  time  distribution) . 

•  It  cannot  represent  the  parallel  activity  in  the 
channel/device  subsystem. 

•  It  cannot  account  for  load  dependent  and  access 
pattern  dependent  seek  tiroes. 
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•  Using  only  channel  service  times  results  in  too  high 
throughput,  as  the  seek  time  is  not  included  into  the 
model,  while  using  combined  channel  and  device  service 
times  results  in  too  low  throughput,  as  the  channel  is 
blocked  during  the  seek. 


T 
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t  for  the  delay  due  to  disk  seek  time,  Chiu 
the  mean  seek  time  to  the  mean  service  time 
above.  As  the  seek  and  the  wait  for  the 
performed  in  parallel,  and  as  seeks  for 
uests  may  be  done  in  parallel,  this  method  can 
answers  only  for  lightly  loaded  systems,  where 
f  parallelism  within  the  I/O  subsystem  is 
ven  more  problematic  in  this  respect  is  the 
f  the  I/O  service  times  from  hardware 
and  block  sizes  [  BW  74]. 


If  the  relative  utilizations  are  calculated  by  mean 
service  times  and  transition  probabilities,  their  derivation 
for  multiple  class  models  is  straightforward.  The  mean  I/O 
service  times  should  be  independent  of  the  customer  class. 
They  may,  however,  vary  among  devices  (even  for  the  same 
type)  .  For  example,  disks  may  have  different  access 
patterns.  If  the  total  channel  busy  times  are  used  to 
calculate  the  relative  utilizations,  it  may  not  be  easy  to 
break  them  down  to  classes. 


The  Multiprogramming  Level 


There  are  many  different  methods  reported  for  the 
determination  of  the  multiprogramming  level.  In  general  the 
measured  or  derived  multiprogramming  level  will  not  be  an 
integer.  Rose  [ROSE76]  suggested  evaluating  the  models  using 
the  next  higher  and  the  next  lower  integer  multiprogramming 
levels  and  interpolating  linearly.  He  determined  the 
multiprogramming  levels  for  his  models  by  the  SHF  analyzer, 
an  evaluation  program  for  the  accounting  monitor  SMF. 
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Bhandiwad  and  Williams  [ BW  74]  used  the  number  of 
partitions  (initiators)  as  the  multiprogramming  level.  This 
is  only  acceptable  for  operating  systems  with  a  fixed  number 
of  partitions  and  for  benchmark  studies.  In  this  case^  the 
number  of  active  partitions  is  rather  constant  except  at  the 
end  of  the  benchmark.  Another  method  is  to  use  the  internal 
queue  lengths  as  reported  by  software  monitors,  as  Sevcik 
and  Unruh  did  [SO  75].  One  must,  however,  be  aware  that 
some  of  the  available  monitors  may  not  display  all  the 
relevant  queues  and  all  tasks  counted  may  not  be  user  tasks. 


If  the  time  in  system  of  the 
following  method  may  be  used 
multiprogramming  level: 


MIL 


resident  times 
T 


tasks  are  known,  the 
to  determine  the 


where  MPL  is  the  multiprogramming  level  and  T  is  the  length 
of  the  mesurement  period. 

A  method  that  avoids  specifying  a  particular 
multiprogramming  level  was  used  in  [KSS77].  From  the  number 
of  the  completed  jobs  and  the  measurement  interval  length, 
an  average  arrival  rate  is  calculated.  This  arrival  rate  is 
then  used  for  the  user  interface  model  of  a  hierarchical 
model.  The  network  model  on  the  operating  systems  level  is 
evaluated  for  all  possible  multiprogramming  mixes.  ‘ 
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3.  Monitors 


In  order  to  specify  the  parameter  values  for  queueing 
network  models,  we  use  monitors  to  measure  the  corresponding 
quantities  in  the  real  systems.  Monitors  are  tools  that 
observe  the  operation  of  a  computer  system  over  a  period  of 
time  and  record  the  values  of  the  variables  that  are 
considered  to  be  significant  for  evaluating  the  system*s 
operation , 


For  this  purpose  a  measure ment  model  is  defined  whose 
variables  are  the  variables  of  interest  in  the  computer 
system.  The  state  of  this  model  is  specified  by  the  vector 
of  the  model  variables.  The  state  space  of  the  measurement 
model  consists  of  all  feasible  value  combinations  of  the 
model  variables.  For  all  feasible  transitions  between  the 
states  the  corresponding  events  in  the  computer  system  have 
to  be  determined.  Eased  on  this  measurement  model,  there  are 
two  basic  approaches  tc  the  monitoring  of  computer  systems: 

The  event  trace  mode,  which  records  all  state  transition 
events  and  thus  permits  a  reconstruction  of  the  sequence 
of  states  of  the  computer  system.  From  an  event  trace 
the  interevent  time  probability  distribution  and  the 
state  probability  distribution  can  be  calculated. 

The  sam^lir^  mode,  which  inspects  the  computer  system  at 
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application  of  measurements,  a  specific  measurement  model 
should  be  designed. 

In  queueing  network  models,  four  important  levels  of 
detail  of  the  measurements  can  be  identified.  These  levels 
correspond  to  the  modelling  levels  defined  in  section  2.4. 
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The  aicroprograa  level ;  The  states  are  the  contents  of 
the  microprogram  registers.  The  events  are  the 
executions  of  microinstructions  and  memory  accesses.  On 
this  level  the  effects  of  memory  interleaving,  cycle 
stealing,  and  high  speed  cache  memory  can  be  Studied. 


The  instruct  ion  level:  The  states  are  the  contents  of 
the  CPU  registers  and  the  status  registers  of  the  CPU 
and  the  I/O  subsystem.  The  events  are  the  executions  of 
CPU  instructions  and  the  executions  of  channel  and 
device  commands. 

The  operating  system  level:  The  states  of  this  level  are 
the  states  of  the  programs  as  defined  by  the  operating 
system,  e,g.  ready,  blocked,  running,  etc.  The  events 
are  calls  for  supervisor  services,  I/O  initiations, 
interrupts,  etc. 


The  user  interface  level:  The  state  is  defined  by  the 
multiprogramming  mix,  and  the  events  are  initiations  and 
terminations  of  tasks. 


The  instruction  level  and  the  microprogram  level  can 
also  be  called  the  hardwa re  level,  whereas  the  operating 
systems  level  and  the  user  interface  level  are  software 
levels ,  Queueing  network  models  have  so  far  most  frequently 
been  applied  on  the  operating  system  level  and  the  user 
interafce  level.  A  similar  level  structure  was  suggested  by 
Svcbodova  f  SV0P76  ], 


As  neither  the  event  trace  nor  the  samples  relay  much 
useful  information  by  themselves,  measurement  data  generally 
must  be  reduced  to  a  small  set  of  meaningful  system 
statistics.  The  reduct  ion  can  be  done  immediately  during 
the  monitoring  process  or  afterwards  as  a  separate  pass  over 
the  collected  data.  In  the  first  case,  only  the  reduced  data 
need  be  retained,  whereas  in  the  second  case  all  monitoring 
data  must  be  recorded.  During  the  monitoring  interval,  the 
states  of  the  system  may  be  displayed  immediately  on  the 
monitoring  equipment. 
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Three  major  types  of  monitors  can  be  distinguished: 

Hardware  monitors  are  devices  which  are  attached  to 
various  points  of  the  hardware  of  the  computer  system. 
They  can  read  register  contents,  count  events,  etc. 

Software  monitors  are  routines  inserted  into  the 
software  of  the  computer  system  which  record  and  in  seme 
cases  also  reduce  the  measurement  data. 

H  y bri d  lonJ,tors  are  combinations  of  the  previous  two 
types.  Measurement  information  is  moved  by  system 
routines  into  special  monitoring  registers,  from  where 
it  is  recorded  by  hardware  monitors. 

Generally,  hardware  monitors  can  be  applied  for 
measurements  on  the  hardware  levels,  and  software  monitors 
can  be  applied  tor  measurements  on  the  software  levels. 

Monitors  have  a  wide  range  of  applications.  Major 
application  areas  are: 

•  accounting  for  computing  services; 

•  determining  the  utilization  of  the  resources  of  the 
computer  system  for  planning  purposes; 

•  performance  evaluation  by  supplying  parameters  for 
models  such  as  analytic  or  simulation  models. 

There  is,  in  general,  no  "best”  measurement  method,  but 
for  each  application  the  appropriate  level  of  detail,  the 
appropriate  monitor  type,  and  the  appropriate  monitoring 
mode  must  be  chosen.  Good  descriptions  of  monitoring  and 
data  reduction  methodology  can  be  found  in  rsV0B76,  DRUM73, 
and  LUM75]. 
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3.1  Honitorinq  Tools 

Hardware  Wcnitors 


A  hardware  monitor  'reads*  the  state  of  the  computer 
system  via  electronic  probes  which  are  attached  to  its 
circuitry^  and  either  records  the  measurements  on  an 
external  device,  or,  as  in  most  cases,  reduces  it 
immediately.  The  reduction  operations  that  typical  hardware 
monitors  can  perform  on  the  data  are  counting  events  or 
recording  times  during  which  the  system  is  in  interesting 
states. 

Although  basic  logical  operations  such  as  *and*  and  *cr* 
can  be  performed  on  the  sensed  signals,  the  scope  and  the 
flexibility  of  hardware  monitors  are  very  limited.  They  can 
record  only  physical  states  of  the  system,  such  as  register 
contents,  but  no  logical  information,  e.g.,  which  program 
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Hardware  monitors  have  the  major  advantage  that  they  are 
operated  separately  from  the  monitored  system.  This  means 
that  they  do  not  interfere  with  the  operation  they  are 
measuring,  and  they  can  record  very  high  event  rates.  The 
times  of  the  events  can  usually  be  measured  very  precisely. 

operative 


Hardware  monitors  stay 
system  crashes. 


even  if  the  monitored 


Software  Monitors 


Software  monitors  can  basically  record  all  information 
which  is  accessible  to  programs.  This,  along  with  their 
great  flexibility  in  the  selection  and  reduction  of  the 
measurement  data,  makes  them  very  powerful  tools  for 
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monitoring  on  the  software  level  and,  to  a  lesser  extent,  on 
the  instruction  level. 

As  software  monitors  use  the  resources  whose  usage  they 
are  to  measure,  they  perturb  the  system  significantly  and, 
in  the  extreme,  may  render  the  results  meaningless.  The 
usage  of  the  CPD,  the  main  memory,  and  the  I/O  facilities 
by  the  monitor  increases  as  the  amount  of  detail  in  the 
measuEents  and  the  complexity  of  the  reduction  operations 
increase.  Due  to  the  monitoring  overhead,  not  only  is  the 
recorded  usage  higher  than  under  normal  circumstances,  but 
also  the  behaviour  of  the  operating  system  and  the  user 
jobs  may  deviate  from  normal,  particularly  if  parts  of  the 
system  are  highly  utilized.  As  a  result  of  this  we  can 
state  an  analog  to  Heisenberg's  uncertainty  principle:  The 
more  detail  and  accuracy  we  attempt  in  the  measurement  of 
the  activity  of  a  computer  system  with  a  software  monitor, 
the  mere  we  perturb  what  we  want  to  measure. 

In'  some  cases  a  software  monitor  can  make  use  of 
information  that  is  accumulated  by  the  operating  system  for 
its  own  use,  such  as  system  tables,  resource  usage  rates, 
etc.  This  can  reduce  the  monitoring  overhead  considerably. 


•ii  1  flonitors 

Hybrid  monitors,  being  combinations  of  hardware  monitors 
and  software  monitors,  share  the  advantages  of  both,  and 
each  component  reduces  the  disadvantages  of  the  other.  The 
recording  is  done  by  the  external  hardware  monitor,  so  the 
I/C  process  is  completely  unperturbed.  If  no  immediate 
reduction  is  done,  moving  the  measurement  data  into  the 
monitoring  registers  will  also  cause  little  CPD  overhead. 
Also,  all  levels  of  detail  can  be  observed  simultaneously  if 
necessary. 

The  one  great  disadvantage  of  hybrid  monitors  is  that 
they  need  special  hardware  inserted  into  the  computer 
itself,  namely  the  addressable  monitoring  registers.  In 


52 


most  cases,  this  addition  is  extremely  costly  if  it  is  not 
considered  in  the  original  system  architecture,  so  hybrid 
mcnitors  are  used  very  rarely.  An  interesting  realization 
of  a  hybrid  monitor  is  HEHI  (SEBA74), 


3.2  Hcnitcrinq  Modes 

The  two  basic  monitoring  modes  can  be  used  in 
ccmbiration  with  all  three  monitor  types.  As  the  differences 
between  them  are  greatest  for  the  software  monitor,  they 
will  be  compared  with  software  monitors  in  mind. 


Event  Trace  Mode 
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he  most  detailed  data  about  the  operation  of  a  computer 
in  can  be  attained  by  an  event  trace.  Recording  all 
transition  events,  it  can  be  used  for  a  complete 
struction  of  the  interesting  activities  via  a  trace 
n  simulation.  It  also  can  be  used  for  the  analysis  of 
ient  behaviour,  and  it  can  produce  very  detailed 
sties  like  the  proportions  of  the  measurement  period 
the  system  spends  in  certain  states.  The  most 
ntary  event  trace  with  immediate  reduction  is  the 
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Sampling  Mode 

This  mode  involves  sampling  the  state  of  the  system  at 
predetermined  times,  gathering  statistical  data  about  the 
system  states.  Thus  the  amount  of  detail  sampling  monitors 
can  record  is  somewhat  less  than  the  amount  of  detail  event 
trace  monitors  can  capture.  Transient  phenomena  cannot  be 
observed  in  detail.  The  most  freguent  reduction  of  sampling 
data  is  the  calculation  of  the  system  state  probability 
distributions.  In  this  case  an  equilibrium  assumption  has  to 
be  made. 

For  the  sampling  mode,  the  monitoring  data  rate  and  thus 
also  the  overhead  can  be  controlled  by  specifying  the  size 
of  the  sampling  intervals.  Intervals  that  are  too  short 
will  result  in  high  overhead.  The  number  of  samples  should 
be  made  large  enough  to  achieve  a  reasonably  small 
confidence  interval  on  quanitities  of  interest.  A  great 
advantage  of  sampling  monitors  is  that  they  do  not  require 
changes  in  the  operating  system,  but  need  just  one  program 
which  is  called  at  the  monitoring  instances.  This  also  makes 
an  estimation  cf  the  resource  usage  by  the  monitor  easier. 


3.3  Survey  of  Monitors 

In  this  section,  we  survey  some  existing  monitors.  As 
the  actual  measurements  for  the  later  chapters  are  done  on 
an  IBM  370/165-11  system  running  under  MVS,  some  measurement 
tools  available  for  this  system  are  described. 


1 he  System  Management  Facilities _ (SMF) 

The  System  Management  Facilities  [IBM76a]  is  an  event 
trace  monitor  that  records  events  on  the  user  interface 
level.  It  produces  records  at  job  step  termination  and  at 
job  termination  containing  data  about  the  resource  usage  of 
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the  job  steps  and  the  jobs  such  as  CPU  tiie  used,  number  of 
I/O  operations  per  device,  number  of  page  faults,  etc. 
Resource  usage  by  the  user  programs  and  accountable  overhead 
are  not  distinguished.  The  data  that  are  to  be  recorded  can 
be  selected  by  parameters.  Most  of  the  data  do  not  need  to 
be  collected  especially  for  SMF,  but  are  available  from  the 
System  Resources  Manager  (SRM)  [IBM  76d  ],  a  module  of  the 
operating  system.  As  SMF  is  intended  for  analysis  of 
resource  usage  of  user  jobs  and  for  accounting  purposes,  it 
does  not  include  any  unaccountable  systems  overhead.  So  the 
resource  usage  of  all  jobs  does  not  add  up  to  the  total 
resource  usage  of  the  system. 
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The  Resource  Measurement  Facility _ (RMF) 

The  Resource  Measurement  Faclity  [IBM76c]  is  a  sampling 
monitor  on  the  operating  systems  level.  In  intervals  whose 
length  is  specified  by  the  analyst,  it  records  the  counts  of 
certain  system  events  and  resource  usage  during  the 
interval,  and  the  system  state  at  the  sampling  instant.  All 
system  activity,  i.e.,  user  activity  as  well  as  system 
overhead,  is  included.  Most  of  the  data  are  compiled  by  the 
System  Resources  Manager  for  its  own  purposes,  so  that  the 
overhead  for  RMF  is  very  small.  The  data  which  are  recorded 
can  be  selected  by  parameters. 
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viritten  by  the  analyst  can  be  applied  as  well  as  the 
standard  PMF  routines. 


Generalized  Trace  Facility _ (GTjQ 


The  Generalized  -  Trace  Facility  [IBM76h]  is  an  event 
trace  reonitor  for  events  on  the  operating  systems  level. 
Monitor  calls  that  can  be  activated  by  a  system  command  are 
inserted  in  all  important  system  routines.  Events  such  as 
dispatching,  I/O  initiations,  I/O  interrupts,  supervisor 
calls,  etc.,  are  recorded  along  with  time  stamps,  task 
identifiers,  and  contents  of  status  registers.  The  events  to 
be  recorded  can  be  selected  by  parameters.  For  each  event  a 
record  is  written  on  tape.  GTF  is  independent  of  the  host 
operating  system  with  respect  to  I/O  services.  It  handles 
all  I/O  by  itself.  The  event  rate  can  become  extremely 
high,  and,  under  high  workload,  trace  records  may  get  lost 
if  the  buffers  can  not  be  written  fast  enough  on  tape.  In 
benchmark  experiments,  the  CPU  busy  time  can  be  increased  by 
factors  as  high  as  two.  Not  all  GTF  overhead  is  identified 
as  such  by  GTF  [LAZ077a]. 


Due  to  the  large  amount  of  detail  it  can  report,  GTF  is 
a  very  powerful  measurement  tool  which  can  be  used  for  a 
wide  range  of  purposes.  It  does  not  contain  any  reduction 
routines,  but  a  variety  of  special  reduction  routines  for 
systems  tuning  are  available  from  IBM  [IBM  77], 


TO^ON 

TORMON  [CM  73]  is  a  hardware  monitor  for  measurements  on 
the  instruction  level,  developed  at  the  University  of 
Toronto  Computing  Centre  (OTCC) .  It  has  ten  counters  that 
are  usually  attached  to  prewired  probe  points,  but,  in 
principle,  can  be  attached  to  any  accessible  probe  point  in 
the  system.  TORMON  has  a  logic  panel  to  combine  the  signals 
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froB  different  probe  points.  The  resulting  counts  can  be 
read  out  nanually  as  well  as  be  written  into  an  APL 
workspace  via  a  special  interface.  In  the  second  case  there 
is  soae  dead  time  during  the  writing  in  which  no  data  can  be 
recorded.  TOPWON  is  used  regularly  at  OTCC  for  recording 
instruction  counts,  buffer  hit  ratio,  CPO  busy  time,  CPU 
problem  state  time,  etc. 


SLACMON 


SLACMON  [LA  70]  is  a  widely  distributed  sampling  monitor 
on  the  operating  systems  level.  The  CUE  commercial  monitor 
available  from  Boole  and  Babbage  is  very  similar  to  SLACflON. 
SLACMON  counts  system  events  and  samples  operating  system 
variables.  It  is  intended  for  the  analysis  of  resource 
utilization  and  produces  very  detailed  reports  on  system 
oriented  measures  like  utilizations  of  CPU,  channels,  and 
devices,  core  usage,  I/O  interrupts,  etc.  Also,  special 
data  are  provided  about  potential  bottlenecks  in  the 
operating  system.  SLACMON  does  not,  however,  produce  job 
oriented  data  or  data  broken  down  into  job  classes. 


To  minimize  overhead,  the  sampling  intervals  for  the 
various  system  variables  can  be  adjusted  separately.  The 
variables  to  be  monitored  can  be  selected  by  parameters. 
SLACMON  uses  no  I/O  services  during  the  monitoring  period 
but  reduces  all  data  into  tables  in  core  and  formats  and 
prints  them  after  the  measurements  are  terminated.  The 
authors  of  a  paper  on  SLACMON  [LA  70]  suggest  that  about 
three  hundred  samples  are  necessary  to  obtain  a  sufficiently 
small  confidence  interval. 
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3,4  Wcrklcad  Considerations 


Most  perforiaance  measures^  such  as  queue  lengths  or 
response  times,  show  a  behaviour  that  is  not  linear  with  the 
workload.  Thus  it  is  extremely  important  that  conclusions 
be  drawn  from  the  measurements  taken  under  the  appropriate 
workload. 


The  workload  of  a  system  is  "the  total  resource  demand 
generated  by  the  user  community"  [SVOB76].  Usually  the 
workload  changes  with  time,  and  at  most  installations  its 
behaviour  shows  daily,  weekly,  or  even  monthly  cycles.  As 
the  variations  can  be  very  great  it  is  important  to  decide 
under  what  workload  a  system  is  to  be  monitored. 


Due  to  large  variations,  an  average  workload  over  a  long 
period,  i.e.,  over  one  or  more  periodic  cycles,  often  does 
not  describe  the  actual  workload  appropriately  for 
performance  evaluation.  Normally  workload  cannot  be  spread 
out  evenly  over  long  periods. 


A  LYBlcal  workload  would  be  a  workload  averaged  over  a 
precisely  defined  period  during  which  the  real  workload  does 
not  change  very  much,  and  the  statistics  of  which  should  be 
reproducible.  A  definition  of  a  typical  workload  could  be 
"the  workload  during  prime  shift  on  a  workday  at  month's 
end".  A  workload  that  can  be  considered  to  be  typical  for 
an  installation  can  be  found  by  using  high  level  monitoring 
data  which  are  recorded  continuously,  such  as  accounting 
data. 


Peak  workloads  usually  occur  only  during  short  time 
intervals  within  the  workload  cycles.  They,  too,  can  be 
detected  by  accounting  data. 


Sjfnt^tic  workloads  or  benchmarks  are 
constructed  job  streams  that  generate  a  workload 
to  a  specified  actual  workload.  To  construct 
necessary  to  know  the  actual  workload.  The  main 
of  benchmarks  is  that  they  can  be  specified  co 
the  analyst,  and  that  they  are  reproducible  at 
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tine.  They  do  not  necessarily  reflect  a  workload 
encountered  in  the  system. 

For  all  measurements  of  real  workloads,  the  measurement 
periods  have  to  be  long  enough  to  make  sure  that  the 
resulting  data  represent  a  general  situation.  It  may  be 
questionable  whether  periods  of  about  15  minutes,  which  are 
quite  common  for  GTF  traces,  are  sufficiently  long.  On  the 
other  hand,  the  periods  should  be  short  enough  not  to 
include  large  variations  of  the  workload.  This  conflict  can 
be  resolved  by  accumulating  data  from  several  short 
measurement  periods  during  the  same  time  interval  in  the 
periodic  cycle. 
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4.  Paraaeter  Specification  Methodology 


4,  1  Objectives 


Du€  the  extrene  conplexity  of  their  structure  and 
operation,  computer  systems  are  difficult  to  either  describe 
or  comprehend.  This  is  why  abstractions,  such  as  queueing 
network  models,  that  represent  only  the  essential  parts  of  a 
system,  are  used  to  obtain  insight  into  certain  aspects  of 
computer  systems.  The  more  organized  and  systematic  the 
abstraction  procedure  is,  the  easier  it  is  to  apply,  and  the 
more  reliable  the  conclusions  drawn  from  its  results  are. 

The  process  of  modelling  a  computer  system,  starting  out 
with  the  design  of  the  model  and  ending  with  the 
interpretation  of  the  performance  results,  can  be  viewed  as 
a  series  of  distinct  modelling  steps  leading  from  one 
®odellinq  stage  to  the  next.  He  want  to  isolate  these  steps 
and  to  show  their  underlying  assumptions  in  order  to  better 
understand  the  modelling  process.  This  should  lead  to  a 
clearer  and  more  methodical  approach  to  modelling. 


To  gain  insight  into  the  operation  of  a  computer  system, 
we  isolate  the  major  components  that  influence  its 
behaviour.  Establishing  cause  and  effect  relationships  as 
far  as  possible  will  make  it  easier  to  build  an  accurate 
model  of  the  system,  to  implement  system  changes  in  the 
model,  and  to  interpret  the  available  performance  measures. 

By  a  systematic  approach  to  parameter  specification,  we 
obtain  clearly  defined  parameter  sets  and  transformation 
procedures.  Based  on  these  parameter  sets,  we  can  determine 
the  system  variables  that  must  be-measured  and  select  the 
appropriate  monitoring  tools. 

In  the  entire  modelling  process,  from  the  conception  of 
the  model  to  the  evaluation  of  the  results,  three  major 
phases  can  be  identified:  the  design  phase,  the  validation 
phase,  and  the  prediction  phase. 


60 


In  the  design  phase,  we  select  the  type  of  the  queueing 
network  ncdel  according  to  the  questions  we  intend  it  to 
answer.  This  selection  is  based  on  a  coaprehensi ve ^ 
qualitative  description  of  the  actual  computer  system  that 
covers  at  least  the  following  points: 

•  the  system  configuration 

•  the  logic  of  the  operating  system 

•  the  workload  of  the  system 

•  the  monitoring  facilities 

•  information  about  the  operating  aspects  of  the 
computing  center,  such  as  shift  patterns,  etc. 

As  well,  the  parameter  requirements  of  the  intermediate 
stages  and  the  queueing  models  are  determined.  (The 
intermediate  stages  will  be  defined  in  section  4.2.) 

In  the  val idat ion  phase .  the  queueing  network  model  is 
inplemented  using  several  distinct  abstraction  steps  that 
lead  from  one  modelling  stage  to  the  next.  The  steps 
involved  are: 

•  the  measurements  of  the  actual  computer  system 

•  the  specification  of  an  intermediate  model  that 

represents  the  system  on  a  logical  level, 

•  the  specification  of  the  input  parameters  to  the 
solution  algorithm,  and 

•  the  computational  solution  of  the  model. 

If  it  can  be  assumed  that  inaccuracies  in  the  parameter 
estimation  have  lead  to  errors  in  the  performance  measures, 
seme  of  the  parameters  may  be  calibrated. 

In  the  prediction  phase,  the  proposed  system  changes  are 
implemented  and  the  desired  performance  predictions  are 
derived. 

In  order  to  describe  the  design  phase,  in  which  the 
models  are  selected  and  the  parameter  requirements  of  the 
modelling  stages  are  determined,  we  have  to  know  what  these 
stages  are.  Sc,  before  the  entire  modelling  procedure  is 
outlined  in  section  4, 3,  the  abstraction  steps  of  the 
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validation  phase  and  the  interaediate  stages  resulting  from 
these  steps  are  described  in  lore  detail  in  section  4«2.  In 
section  4*4  the  proposed  aodelling  methodology  will  be 
demonstrated  by  an  example. 


4.2  The  Steps  and  the  ^Stages  of  the  Validation  Phase 

Figure  1  shows  the  steps  and  the  stages  of  the 
validation  phase.  The  stages  are  indicated  by  the  boxes. 
The  steps  are  symbolized  by  the  arrows. 


4.2,1  Measurements 


In  the  first  step,  the  monitoring  experiments  are 
performed  giving  the  measurement  data.  In  most  cases 
several  mcnitcrs  are  necessary  to  collect  all  the  data 
needed.  A  possible  combination  for  an  IBM  /370  MVS  system 
is,  fcr  example,  the  software  monitor  RMF,  giving  system  and 
device  related  data,  and  the  software  monitor  SMF,  recording 
task  related  data,  A  hardware  monitor  may  be  used  to  obtain 
some  of  the  data  that  are  not  easily  accessible  via 
software,  and  to  estimate  the  overhead  due  to  the  software 
monitors. 


For  a  detailed  modelling  study,  several  monitoring 
experinents  should  be  performed.  The  measured  workloads 
must  be  typical  rather  than  exceptional.  Also,  measurements 
should  be  performed  under  various  workloads  to  obtain 
information  about  the  load  dependence  of  the  system's 
behaviour.  Finally,  the  system  should  be  measured  by  a 
hardware  monitor  with  and  without  the  software  monitors 
using  an  identical  workload  (if  possible)  in  order  to 
estimate  the  amount  of  software  monitor  overhead  included  in 
the  measurement  data. 


62 


Coiputer  I 

Systesi  i 


I 

i 


Measureients 


Calibration 
- > - 


Parameter  Estimation 


System  Hodel 


Model  Workload 


Program  1  1 

Behaviour  |  | 


Interference 

Pattern 


Multiprogramming  Mix 


Resource 

Attributes 


I 

! 


Parameter  Representation 


I  Queueing  Network  | 
I  Model  Parameters  | 

I 


i 


Solution  Algorithm 


I  Performance 
I  Measures 


1 

I 

I 

i 


Figure  1.  Validation  phase. 


Measurement  Data 


The  measurement  experiments  produce  lar 
data  that  cannot  directly  be  used  as  input 
These  data  are  not  necessarily  related 
functions  of  the  computer  system.  The 
intervals,  percentages,  event  counts,  etc, 
related  to  each  other  and  interpreted  in 


ge  amounts  of  raw 
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to  the  logical 
y  specify  time 
,  that  have  to  be 
order  to  get  a 
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picture  of  the  operation  of  the  system.  Often 
rement  data  are  aggregate  quantities^  such  as  the  CPU 
of  a  job,  which  is  an  aggregate  of  the  number  of 
tions  of  the  job  and  the  processing  speed  of  the  CPU. 

trace  data  contains  only  information  about  single 
s  without  any  relation  to  other  events.  They  have  to 
rocessed  to  obtain  the  required  statistics.  If  the 
rement  data  contains  much  monitoring  overhead,  it  must 


djusted  to  reflect  the 
ra  as  closely  as  possible. 


unperturbed  operation  of  the 


4.2.2  Parameter  Estimation 


Parameter  estimation  is  the  step  from  the  measurment 
data  to  the  logical  system  model.  The  measurements  are 
interpreted  and  related  to  each  other.  Aggregated  data  is 
separated  into  its  components.  In  compacting  highly 
detailed  data  like  event  trace  records  in  order  to  obtain 
the  statistics  required  by  the  system  model,  as  much  useful 
detail  as  possible  should  be  retained. 


Different  monitors  do  not  always  measure  compatible  sets 
of  system  variables.  For  example,  RMF  gives  the  numbers  of 
physical  I/O  operations,  SIOs,  whereas  SMF  counts  the  number 
of  I/O  requests  by  tasks,  EXCPs  [IEM76d].  Such 
inconsistencies  have  to  be  treated  by  recognizing  them  and 
developing  functions  that  express  one  variable  in  terms  of 
the  other. 


System  Model 

In  the  syst^  model .  all  essential  parts  and  functions 
of  the  actual  computer  system  are  represented  from  a  logical 
point  of  view.  They  are  looked  upon  as  service  facilities 
for  user  programs.  Service  facilities  need  not  exclusively 
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be  hardware  components,  but  may  also  include  operating 
system  procedures  or  data  sets.  The  system  model  must 
certainly  cover  all  the  details  represented  explicitly  in 
the  queueing  network  model.  Eut  it  may  also  contain  details 
that  are  represented  only  implicitly  in  the  queueing  network 
model.  For  example,  I/O  channels  and  devices  may  be 
represented  explicitly  in  the  system  model,  but  in  the 
queueing  network  model  they  are  often  represented  by 
aggregate  servers.  Beyond  the  details  required  for  the 
queueing  network  model,  the  system  model  should  at  least 
cover  all  the  details  necessary  to  represent  the  proposed 
system  changes  explicitly.  In  most  cases,  the  system  model 
is  not  a  solvable  queueing  model,  as  it  may  contain  aspects 
of  the  real  system  that  can  not  be  directly  represented  in 
the  intended  queueing  models.  Arbitrary  service  time 
distributions  or  complex  dispatching  disciplines  are 
examples  of  such  features.  The  system  model  is  introduced 
to  describe  separately  the  major  components  that  influence 
the  system  performance.  It  should  provide  a  highly  detailed 
model  that  allows  an  easy  representation  of  system  changes. 
It  also  gives  insight  into  the  operation  of  the  computer 
system.  Most  solvable  queueing  models  are  restricted  too 
much  by  their  mathematical  aspects  to  serve  these  purposes. 

There  are  two  strategies  for  establishing  the  system 
model.  It  can  be  designed  to  be  as  detailed  as  the 
monitoring  tools  and  the  time  available  for  measurements 
permit.  A  highly  detailed  system  model  can  be  used  as  a 
basis  fcr  a  wide  range  of  models,  not  necessarily  restricted 
to  queueing  models.  If  it  is  updated  from  time  to  time,  it 
can  ke  employed  for  continued  observation  of  the  system's 
performance.  Alternatively,  the  system  model  can  be 
designed  as  a  minimal  model,  just  supplying  the  parameters 
necessary  for  the  intended  queueing  network  model.  This 
approach  keeps  the  measurement  and  modelling  effort  small 
when  cnly  one  queueing  model  is  to  be  built. 
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The  systei  lodel  consists  of  se¥eral  parts^  each 
representing  one  aspect  of  the  conputer  systea* 

The  proqraa  behaviour  aodel  characterizes  the  behaviour 
and  the  service  requireaents  of  the  user  programs. 

The  interference  pattern  captures  all  interference  among 
user  tasks  and  the  operating  systea. 

The  Bultiproqraaainq  aix  specifies  the  job  mix  under 
which  the  aodel  is  to  be  evaluated. 

The  aodel  workload  specifies  the  workload  for  the 
modelled  system  resources.  It  is  derived  from  the 
program  behaviour  model,  the  interference  pattern,  and 
the  multiprogramming  mix. 

The  resource  attributes  describe  the  properties  of  the 
modelled  service  facilities  and  the  manner  i;i  which  they 
process  the  service  requests  of  the  user  tasks. 


Program  Behaviour  Model 


The  program  behaviour  model  captures  the  dynamic 
behaviour  of  user  programs  in  the  computer  system  by 
specifying  the  patterns  of  their  service  demand  on  the 
system's  resources  like  logical  CPU  burst  distributions,  the 
probabilities  of  transitions  from  one  server  to  another, 
etc.  Data  sets  and  certain  operating  system  procedures,  as 
well  as  hardware  components  of  the  system,  may  be  viewed  as 
resources.  In  addition,  the  program  behaviour  model 
contains  the  total  service  demand  of  the  user  programs  on 
each  service  facility.  To  achieve  hardware  independence, 
the  service  demand  is  specified  in  numbers  of  operations 
rather  than  service  times  or  total  times  spent  at  devices. 

In  order  to  obtain  the  desired  isolation  of  the  user 
ptcgrams  from  the  operating  system,  only  service  requests 
made  directly  by  the  user  programs,  for  instance  the  logical 
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CPU  bursts  and  the  file  EXCPs,  should  be  included.  All 
interference  froa  the  operating  system  or  from  other  user 
tasks,  such  as  swaps  or  I/O  interrupts,  should  be 
eliminated.  This  is  not  always  possible.  The  paging 
behaviour  cf  a  program,  for  example,  is  determined  by 
program  properties  like  the  locality,  as  well  as  by  system 
and  workload  parameters  such  as  the  memory  management  policy 
and  the  paging  activity  of  currently  active  tasks.  A 
certain  isolation  can  be  achieved  by  specifying  the 
operating  system  activities  as  functions  of  the  user  program 
program  behaviour,  as  is  done  in  the  interference  functions. 

In  multiple  class  models,  each  class  has  its  own  program 
behaviour  model.  To  obtain  a  detailed  and  accurate  model, 
the  user  jobs  should  be  classified  according  to  their 
behaviour  and  their  service  reguire ments .  However,  the  only 
class  distinction  in  the  measurement  data  that  can  be 
obtained  easily  is  the  breakdown  according  to  the  job 
classes  assigned  by  the  installation.  In  most 
installations,  this  classification  follows  administrative 
criteria  that  normally  do  not  coincide  very  well  with  a 
classification  according  to  distinct  behaviour  patterns 
[LEST73].  The  program  behaviour  model  relies  very  heavily 
on  the  accounted  service  requests,  i.e.,  the  service 
reguests  attributed  to  user  tasks  by  the  monitors. 


Interference  Pattern 


User  programs  do  not  see  the  hardware  directly. 
Instead,  they  see  a  virtual  machine  that  consists  of  the 
operating  system  and  the  hardware.  Queueing  network  models 
normally  represent  only  the  hardware  components  of  a  system. 
The  gap  between  the  user  programs  as  described  in  the 
program  behaviour  model  and  the  hardware  that  is  represented 
in  the  gueueing  network  models  is  bridged  by  the 
interference  functions.  The  interference  among  user  tasks 
and  the  operating  system  has  two  major  aspects:  the  dyna  mic 


67 


i Dter f erence  function s#  reflecting  the  impact  on  the  dynamic 
behaviour  of  the  tasks,  and  the  static  interference 
functions  that  nap  the  logical  service  requests  into 
physical  operations  and  account  for  the  operating  system 
overhead. 

The  dynamic  interference  functions  describe  the  changes 
in  the  behaviour  of  user  tasks  due  to  other  user  tasks  and 
the  operating  system.  It  captures,  for  example,  the 
difference  between  logical  and  physical  CPU  bursts.  Moore 
reported  that  the  logical  CPU  bursts  of  the  system  he 
measured  were  hyperexponent ially  distributed,  but  the 
physical  CPU  bursts  were  exponentially  distributed  due  to 
the  LCFSPR  discipline  at  the  CPU  [H00R71].  Other  dynamic 
interference  phenomena  are  swapping  to  balance  the  load, 
time  slicing  disciplines,  etc. 

These  effects  are  extremely  difficult  to  identify,  and 
there  are  virtually  no  reports  in  the  literature  about  how 
they  could  be  determined  quantitatively.  One  possible 
method  could  be  to  record  detailed  event  traces  in  a 
ccntrclled  environment  varying  the  multiprogramming  mix 
systematically.  From  the  resulting  statistics,  interference 
functions  with  the  multiprogramming  mix  as  a  variable  could 
be  derived.  These  would  allow  extrapolation  of  the  findings 
to  situations  for  which  no  experiments  can  be  run.  The 
dynamic  interference  has  not  yet  been  treated  in  practical 
modelling.  In  the  case  studies  published  so  far  all  the 
models  have  assumptions  equivalent  to  the  assumption  of 
local  balance.  For  locally  balanced  models  the  dynamic 
behaviour  of  the  tasks  is  irrelevant  [ KR  75,  GIAM77].  Only 
the  overall  workload  placed  on  the  servers  as  expressed  by 
the  workload  vector  is  important,  not  the  lengths  of  single 
bursts  or  transition  probabilities. 

The  static  interference  functions  map  the  number  of 
accounted  service  requests  to  the  resources  of  the  system 
into  the  physical  operations  of  the  system’s  hardware.  They 
have  three  major  aspects: 
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•  They  aap  the  logical  service  requests  of  tasks  into 
the  directly  corresponding  and  accountable  physical 
operations,  for  instance  EXCPs  into  SIOs. 


•  They  distribute  all  unaccounted  resource  usage  among 
the  user  classes. 


•  They  attempt  to  establish  functions  between  the 
resource  usage  by  the  operating  system,  the  overhead 
in  a  general  sense,  and  user  workload  parameters. 


These  aspects  are  certainly  highly  interdependent,  and 
they  cannot  be  separated.  However,  it  is  helpful  to  keep 
them  in  mind  when  designing  the  static  interference 
functions. 


In  order  to  specify  the  static  interference  functions  in 
detail,  the  amount  of  operating  system  activity  connected 
with  the  various  system  services  must  be  determined. 
General  cause  and  effect  relationships  must  be  established. 
It  is  most  important  to  find  the  ^correct”  independent 
variables  of  the  functions.  The  mapping  of  user  data  set 
EXCPs  into  the  corresponding  SIOs,  for  instance,  may  be  a 
constant  ratio,  whereas  the  CPU  overhead  is  probably 
dependent  on  the  multiprogramming  mix,  the  number  of  page 
faults,  and  similar  parameters.  These  dependencies  should 
be  established  according  to  the  operating  system  logic  and, 
where  this  is  not  possible,  by  making  assumptions  about 
their  nature.  They  must  be  quantified  by  measurements  taken 
under  varying  loading  situations  of  the  system,  and  they 
should  be  verified  by  modelling  experiments  using 
measurement  data  taken  under  a  workload  different  from  those 
used  for  establishing  the  interference  functions. 


Ancther  important  point  is  how  the  overhead  is 
represented  in  the  queueing  network  model.  The  approach  in 
which  a  special  overhead  class  is  used  has  several 
shortcomings.  The  resource  demand  of  the  special  class  has 
to  be  made  dependent  on  the  multiprogramming  mix  and  on  the 
program  behaviour  models  of  the  other  classes.  So  for  each 
multiprogramming  mix,  the  resource  demand  of  the  overhead 
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class  has  to  be  calculated  separately*  If  this  is  not  done, 
the  overhead  is  modelled  as  a  constant  workload  that  is 
completely  independent  of  the  user  classes.  This  is 
certainly  not  a  good  assuuption.  Also,  the  performance 
measures  of  an  overhead  class  are  meaningless.  The  only 
situation  in  which  parts  of  the  overhead  should  be  modelled 
as  a  separate  class  -is  when  there  is  a  large,  clearly 
identifiable  system  task  that  gets  special  treatment.  In 
all  other  cases,  the  overhead  should  be  distributed  among 
the  user  jobs.  An  alternative  method  of  connecting  overhead 
with  the  user  tasks  can  be  applied  in  models  whose  solution 
algorithms  allow  class  changes.  Users  requesting  certain 
services  change  into  an  operating  system  class,  and  after 
the  completion  of  the  operating  system  action,  they  change 
back  again  into  their  original  user  class  [ KGT  77]. 


fl 1  3 ® ® i nq  Mix 


The  multiprogramming  mix  is  a  vector  that  indicates,  for 
each  class,  the  number  of  active  jobs.  For  a  single  class, 
it  is  also  called  mult i pro gra m ming  level, 
particularly  for  hierarchical  models, 
multiprogramming  mixes  has  to  be  specified  representing  all 
potential  loading  situations  of  the  computer  system. 


In  many  cases, 
a  set  of 


Model  Workload 

From  the  program  behaviour  model,  the  interference 
functions,  and  the  multiprogramming  mix,  the  model  workload 
is  determined.  It  indicates  for  each  resource  represented 
in  the  queueing  network  model  the  total  service  demand.  The 
model  workload  is  hardware  independent,  in  that  the  service 
demand  placed  on  hardware  components  is  specified  in  numbers 
of  physical  operations. 
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*  Workload*  should  not  always  be  taken  literally*  It  may 
in  sole  cases  just  Bean  'delay*  as,  for  exaaple,  with 
respect  to  terainals,  where  it  neans  user  think  tiBe*  The 
coBputer  system  does  not  perform  any  work  for  a  task  that  is 
at  a  terminal.  For  the  sake  of  uniformity,  however,  the 
term  workload  is  also  applied  to  these  cases. 


Resource  Attributes 

The  resource  attributes  capture  the  resource- rela ted 
aspects  of  the  system  model.  They  contain  the  service 
disciplines  of  all  modelled  resources.  For  hardware 
components,  they  cover  also  the  capacities,  specified  in 
numbers  of  operations  per  unit  time.  Load  dependent 
capacities  of  service  facilities  like  disk  drives  or  entire 
I/O  subsystems  are  described  by  a  set  of  capacity  values, 
one  for  each  possible  loading  situation. 


Stability  of  the  System  Model  Parameters 

If  the  separation  of  the  major  components  of  influence 
on  the  system's  behaviour  is  done  carefully,  most  changes  in 
the  actual  system  should  affect  only  one  component  of  the 
system  model.  So,  once  a  system  model  is  established,  only 
a  few  extra  measurements  should  be  necessary  to  build  a 
model  of  a  particular  aspect  of  a  computer  system.  If  a  new 
CPU  is  to  be  installed,  for  example,  it  should  only  be 
necessary  to  update  the  CPU  capacity. 

One  must,  however,  be  aware  of  changes  in  the  computer 
system,  that  are  not  obvious.  Changes  in  the  charging 
policy  might  affect  the  program  behaviour  model,  or  changes 
in  the  interactive  part  of  the  operating  system  might  not 
only  affect  the  program  behaviour  of  interactive  tasks  but 
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also,  due  to  shifts  of  users  aaonc  classes,  the  progran 
behaviour  of  other  classes. 


4.2.3  Paraaeter  Representation 

Parameter  representation  is  the  mapping  of  the  system 
model  parameters  into  the  input  parameters  of  the  queueing 
network  models  for  which  solution  algorithms  exist.  From 
the  system  model  parameters,  it  has  to  be  decided  what 
system  components  will  be  represented  in  the  queueing 
network  model,  and  how  they  will  be  represented.  Components 
that  are  only  slightly  utilized  may  not  need  to  be  included 
in  the  queueing  network  model  at  all,  whereas  heavily 
utilized  components  should  be  modelled  as  accurately  as 
possible.  After  these  decisions,  the  detailed  functional 
parameters  of  the  system  model  are  aggregated  to  the  gross 
parameters  of  the  servers  in  the  queueing  network  model  as 
required  by  the  solution  algorithm. 


Queueing  Network  Model 


The  parameter  requirements  of  a  queueing  network  model 
are  determined  by  its  type  and  by  the  computational  solution 
algorithm  selected.  The  parameters  do  not  necessarily 
reflect  the  operation  of  the  modelled  computer  system  very 
explicitly.  For  example,  the  input  parameters  for  models  in 
local  balance  may  be  any  consistent  set  of  relative 
utilizations  of  the  modelled  components,  while  in  models 
using  a  global  balance  technique,  a  component  of  the 
computer  system  may  be  represented  by  a  complex  server 
structure  in  order  to  model  its  service  time  distribution 
more  accurately  [COX55]. 
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^,2.4  Model  Solution 


The  solution  algorith*  is  the  coaputational  algorithm  by 
which  the  parameters  of  the  queueing  network  model  are 
transformed  into  the  desired  performance  measures* 


Performance  Measures 


The  performance  measures  are  the 
phase.  They  nay  be  interpreted  using 
actual  computer  system  and  the  system 


goal  of  the  validation 
the  description  of  the 
model. 


4.2.5  Calibration 


If  the  performance  measures  resulting  from  the 
evaluation  of  the  model  do  not  agree  with  the  corresponding 
measured  system  variables,  a  calibration  of  some  of  the 
system  model  parameters  nay  be  done.  The  parameters  that 
are  assumed  to  be  least  accurate  are  adjusted  by  using  a 
calibration  factor,  so  that  the  results  of  the  model  agree 
with  the  corresponding  measurement  data  [BOSE76].  The 
resulting  calibration  factors  must  then  be  applied  as  well 
in  the  subsequent  prediction  phase.  If,  however,  the 
calibrated  parameters  have  to  be  changed  due  to  projected 
system  changes,  the  predictions  may  be  not  very  reliable. 
In  any  case,  the  necessity  of  a  calibration  indicates  that 
some  important  aspects  of  the  system  have  not  been  modelled 
on  the  appropriate  level  of  detail.  Calibration  should  only 
be  applied  if  a  more  accurate  model  is  not  feasible. 
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4.3  The  aodellina  Procedure 


In  desi9nin9  a  nodel,  we  start  by  lookln9  at  the  last 
stage  of  the  validation  phase:  we  ask  the  guestions  the 
Bcdel  should  answer,  or,  more  precisely,  we  deteraine  which 
perforaance  aeasures  the  model  should  predict.  Then,  based 
on  the  qualitative  system  description  and  considering 
possible  solution  algorithms,  we  select  a  queueing  model 
type  that  can  produce  these  predictions.  Knowing  the 
parameter  requirements  of  the  queueing  network  model,  we  can 
define  the  system  model  with  sufficient  detail  to  supply 
these  parameters.  From  the  system  model  we  can  determine 
what  measurement  data  are  necessary  to  specify  its 
parameters.  So,  in  the  model  design  phase,  we  work 
•backward*  over  the  stages  of  the  validation  phase  to  define 
the  parameter  and  measurement  data  requirements.  This  way 
we  make  sure  that  each  stage  provides  all  the  data  the  next 
stage  needs. 

In  the  model  validation  phase,  we  work  ‘forward*  as 
described  in  section  4.2  and  apply  the  transformation  steps 
to  the  measurement  data  to  derive  the  performance  measures 
and,  if  necessary,  to  calibrate  the  system  model  parameters. 

In  the  prediction  phase,  the  changes  whose  impact  on  the 
performance  of  the  computer  system  are  to  be  analyzed  are 
iffplemented  in  the  system  model.  This  step  is  also  called 
perturbation.  Side  effects  of  the  changes  in  the  computer 
system  demand  special  attention.  A  new  operating  system, 
for  example,  may  decrease  the  high  speed  cache  hit  ratio  by 
several  percent  due  to  less  efficient  locality  patterns  in 
the  code,  and  thus  decrease  the  overall  CPU  speed.  The 

parameter  representation  from  the  perturbed  system  model 

{ 

again  must  be  done  very  carefully,  as  some  of  the  decisions 
made  in  the  parameter  estimation  of  the  validation  phase  may 
have  to  be  reconsidered  due  to  the  changes  in  the  system 
model.  Finally,  the  solution  algorithm  is  applied,  and  the 
predicted  performance  measures  are  calculated. 


74 


4,4  A  Single  Class  Model 


In  the  following^  we  will  design  and  validate  a  sinple 
single  class  aodel.  The  purpose  of  this  model  is  to  serve 
as  an  example  by  which  the  methodology  developed  above  can 
be  demonstrated,  and  as  a  preliminary  study  for  the  multiple 
class  model  in  chapter  5, 


4.4.1  Design  Phase 

Qualitative  Description  of  the  Computer  System 

The  computer  system  to  be  modelled  is  the  IBM  S/370-165- 
11  of  the  University  of  Toronto  Computing  Centre  (UTCC) .  It 
runs  under  0S/VS2  (MVS),  a  virtual  memory  system,  supplying 
mainly  batch  service.  A  time  sharing  service  (TSO)  is  also 
available.  The  unit  record  I/O  is  handled  by  a  spooling 
system  (JES2) .  For  temporary  files  the  Virtual  I/O  facility 
(VIO)  can  be  used;  virtual  storage,  that  is  organized  like 
the  virtual  storage  for  programs,  is  allocated  to  temporary 
files.  This  way,  I/O  operations  on  frequently  accessed 
records  can  be  saved. 

The  main  frame  of  the  system  has  4  million  bytes  main 
memory  and  a  high  speed  cache  memory  of  16  thousand  bytes. 
The  access  times  for  the  high  speed  cache  and  the  main 
memory  are  80  nanoseconds  and  2  microseconds  respectively. 
The  peripheral  equipment  is  connected  by  5  channels.  The 
unit  record  equipment  consists  mainly  of  card  reader  and 
line  printer  terminals  on  several  different  locations  on  the 
campus.  The  configuration  of  the  system  is  outlined  in  the 
figure  2. 
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Figure  2,  System  configuration. 


The  System  Resources  Hanager  (SBM)  of  the  operating 
system  tries  to  keep  the  overall  load  of  the  activated  jobs 
balanced  by  initiating  or  swapping  jobs  as  required,  and  by 
dynamically  assigning  dispatching  priorities.  At  the  same 
time,  it  attempts  to  distribute  the  resources  among  the 
active  jobs  according  to  a  predetermined  pattern.  Three 
time  intervals  defined  in  the  operating  system  are  important 
from  a  modelling  point  of  view.  The  resident  time  of  a  task 
is  the  time  its  working  set  is  in  main  memory,  i.e.,  the 
time  it  belongs  to  the  multiprogramming  mix.  The  elapsed 
time  of  a  task  is  the  time  from  its  initiation  until  its 
termination.  Finally,  the  ti me-in~system  of  a  task  is  the 
time  from  when  it  is  read  into  the  system  until  its 
termination.  The  memory  management  uses  a  global,  variable 
partition  strategy,  which  means  that  tasks  can  steal  pages 
of  main  memory  from  each  other,  and  the  partition  size  and 
the  number  of  active  partitions  are  variable.  The  page 
replacement  algorithm  is  the  least  recently  used  (LRU) 
algorithm. 

The  locations  of  the  system  data  sets  that  are  required 
for  the  distribution  of  their  usage  among  the  user  tasks  are 
shown  in  table  2. 


Table  2:  Systen  data  sets 


address 


contents 


351 

355 

356 
451 


spooling  data  set 

■aster  catalog  and  local  page  data  set 

swap  data  sets 

pagable  link  pack  area 

coanon  area 

local  e  data  set 

TSO  ca  og 

systei  catalog 


453 

454 
4D1 
4D2 


The  monitors  SMF,  BMF,  and  TORMON  are  the  aeasurement 
tools  available  that  give  data  useful  for  modelling.  They 
were  described  in  chapter  3. 

The  workload  of  the  system  consists  mainly  of  three 
types  of  jobs: 

•  compute  oriented  batch  jobs  (General  Purpose  Job 
Stream) ,  GPJS 

•  small,  students*  jobs  (High  Speed  Job  Stream) ,  RSJS 

•  interactive  TSO  jobs,  TSO 

The  system  will  be  measured  under  a  benchmark  that  has 
one  jotstream  for  each  of  these  three  classes.  The  TSO 
jcbstream  is  generated  by  simulator  that  uses  a  TSO  command 
script.  The  benchaark  is  not  particularly  designed  for 
modelling  purposes,  but  it  is  used  for  systems  tuning  in  a 
general  context. 


Performance  Measures 

With  the  example  model  we  want  to  calculate  the 
utilizations  of  the  major  hardware  components,  the  mean 
queue  lengths,  and  the  job  throughput.  These  performance 
measures  are  all  system  related.  Dser  related  performance 
measures  would  be  the  mean  time  in  system  for  user  jobs  or 
the  mean  response  time  for  interactive  jobs.  The  workload 
of  the  system  is  assumed  to  be  homogeneous  enough  to  allow  a 
model  without  class  distinctions  if  only  system  related 
performance  measures  are  to  be  calculated. 
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Queue! ng  Network  Model  Definition 


For  calculating  the  perfornance  aeasures  described 
above,  we  can  use  a  single  class,  closed  general  queueing 
network  aodel  in  local  balance.  The  classification  vector 
of  the  nodel  is: 

Model  Type  =  closed  general  network  model 

single  class 

local  balance  disciplines 

workload  vectors 

load  independent  servers 

For  its  solution,  Buzen's  algorithm  can  be  applied.  The 
parameter  requirements  are: 

N:  the  multiprogramming  level 

H:  the  number  of  servers 

xi:  the  relative  utilizations  of  the  servers 

i  ^  1 ,  .  . .  ,  H 

The  direct  results  are  the  utilizations  of  the  servers 
and  the  mean  queue  lengths.  From  these  quantities,  the  job 
throughput  and  the  mean  time-in-system  of  the  tasks  can  be 
calculated.  It  is  questionable  whether  the  mean  time-in- 
system  is  a  useful  performance  measure  for  a  single  class 
model  as  the  actual  times-in-system  may  have  a  very  great 
variance  [IAZ077b]. 

The  model  will  be  implemented  using  a  logical  approach 
cn  the  operating  systems  level.  Spooling  will  not  be 
included.  The  CPU  and  the  disk  I/O  usage  of  the  spooling 
task  will  be  distributed  among  the  user  tasks.  The  dynamic 
interaction  of  the  spooling  task  and  the  unit  record  devices 
with  the  rest  of  the  system  is  assumed  to  be  negligible. 

Only  hardware  resources  will  be  represented  explicitly 
in  the  queueing  network  model.  The  software  service 
facilities  like  paging  and  swapping  will  be  represented  on  ly 
implicitly  as  service  demand  on  the  hardware  resources. 
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Systei  Model 


In  the  following  sections,  the  conponents  of  the  system 
model  are  defined  in  a  *  backward*  procedure*  First,  the 
resource  attributes  and  the  model  workload  parameters  are 
defined  to  supply  the  input  parameters  for  Buzen’s 
algorithm,  the  relative  utilizations.  Next,  the  required 
program  behaviour  parameters  and  the  ways  of  specifying 
interference  pattern  and  the  multiprogramming  level  are 
determined. 


Resource  Attributes 


The  hardware  resources  represented  in  the  system  model 
are  the  CPU,  the  channels  3  and  4,  and  the  disks  connected 
tc  these  channels.  The  channels  1  and  2  are  not  used  by  the 
benchmark,  and  channel  5  is  not  represented  in  the  model 
because  it  connects  only  unit  record  devices  that  are 
handled  by  the  spooling  system  and  interactive  terminals 
that  need  not  be  represented  on  this  level.  The  disk  usage 
of  the  spooling  system,  however,  is  captured  by  the  syste 
model.  The  paging  process,  the  swapping  process,  and  the 
VIC  facility  are  represented  explicitly  as  software 
resources  in  the  system  model,  but  not  in  the  queueing 
network  model. 

The  service  disciplines  of  the  modelled  resources  are 
assumed  to  be  local  balance  disciplines.  In  a  single  class 
model,  it  is  not  important  which  of  the  disciplines  FCFS, 
PS,  or  LCFSPR  are  assumed,  because  their  representation  in 
the  queueing  network  model  is  identical.  For  each  hardware 
component  of  the  model,  the  capacity  must  be  derived  from 
the  measurement  data.  This  can  be  done  according  to  the 
following  formula: 

opU)_ 

Tui" 


c  (i) 
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Op  (i) :  nuaber  of  operations  deyice  i  performed  in  the 
■esareient  interval 

T(i);  busy  tiie  of  deyice  i  daring  aeasurement 
interval 

T  (i)  can  also  be  replaced  by  the  product  T  ♦  0(i),  where 
U  (i)  is  the  utilization  of  device  i  and  T  is  the  length  of 
the  measurement  inter'val. 


Model  Workload 

The  model  worltload  must  be  specified  in  such  a  way  that 
the  parameters  of  the  solution  algorithm  can  be  derived 
easily.  Buzen*s  algorithm  has  as  its  major  input  parameters 
the  relative  utilizations.  The  relati ve  workload  of  a 
server  is  defined  as  the  product  of  its  relative  utilization 
and  its  capacity.  Its  dimension  is  number  of  physical 
operations.  The  relative  workload  is  used  to  provide  a 
normalized  workload  measure  that  is  hardware  independent. 
It  can  be  expressed  as: 

B<i)  =  X(i)  ♦  C(i) 

X(i):  relative  utilization  of  device  i 
B(i):  relative  workoad  of  device  i 
C(i):  capacity  of  device  i 

As  mentioned  previously,  there  are  several  equivalent 
ways  of  specifying  a  consistent  set  of  relative  utilizations 
for  the  input  to  solution  algorithms  of  local  balance 
models.  The  server  capacity  being  an  absolute  value,  these 
ways  differ  only  in  the  normalization  factor  of  the  relative 
workload.  In  the  following,  four  intuitively  obvious 
definitions,  whose  equivalence  can  be  shown  easily,  will  be 
outlined.  These  considerations  are  based  on  ideas  described 
in  [ KR  75,  GIAM77  ]. 

Let  W  (i)  be  the  relative  workload  of  server  i,  specified 
as  the  number  of  operations  and  determined  subject  to  a 
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Doraalization  factor.  Let  L(i)  be  the  total  nunber  of 
operations  the  system  component  i  has  executed  during  the 
measurement  interval  T.  Then  the  relative  workload  for 
server  i  can  be  defined  on  four  different  levels  of  detail: 

a)  total  load  during  T:  H  (i)  =  L(i) 

b)  load  per  partition:  H(i)  =  L(i)  /  MPL 

c)  load  per  job:  H(i)  =  L  (i)  /  Jobs 

d)  load  per  cycle:  »(i)  =  L(i)  /  Cycles 

nPL:  mean  multiprogramming  level 

Jobs:  number  of  jobs  completed 

Cycles:  total  number  of  cycles  through  the 
central  server  network  network 
(=  total  number  of  I/O  operations) 

The  workload  vector  of  the  model  contains  as  its  elements 

the  relative  workload  of  each  device. 

As  can  easily  be  verified,  these  definitions  differ  only 
in  their  normalization  factors.  The  normalization  factor, 
however,  is  irrelevant  to  the  model  as  long  as  the  numbers 
are  small  enough  not  to  cause  an  overflow  during  the 
execution  of  Buzen's  algorithm.  The  normalization  factor  is 
completely  independent  of  any  other  model  parameter,  so  that 
the  most  convenient  definition  can  be  chosen.  From  a 
mathematical  point  of  view  the  four  definitions  of  the 
relative  workload  are  equivalent  under  the  assumptions  made 
for  local  balance  models.  The  mean  multiprogramming  level, 
the  number  of  jobs  completed,  and  the  total  number  of  I/O 
requests  have  to  be  determined  fcr  other  aspects  of  the 
model  as  well.  So  the  definitions  are  also  equivalent  in 
their  measurement  requirements:  all  four  definitions  can  be 
used  with  the  same  set  of  measurement  data.  As  the  relative 
wcrkload  will  be  derived  from  the  program  behaviour  model, 
we  choose  the  job  oriented  definition,  (c)  , 
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Prograa  Behaviour  Model 


For  locally  balanced  models  the  program  behaviour  model 
can  ignore  all  dynamic  aspects  of  a  task's  behaviour.  It  is 
not  necessary  to  determine  the  length  of  the  bursts  a  job 
spends  at  a  device  or  the  probabilities  for  the  transitions 
of  the  jobs  from  one  device  to  another  [ KR  75,  GIAM77]. 
Because  the  model  workload  is  fully  determined  by  the 
workload  vector,  the  progam  behaviour  model  must  only 
contain  the  total  number  of  service  requests  of  an  average 
job  at  each  service  facility. 


service  facility 

CPU 

file  I/O 
VIO 


workload  dimension 

instructions 

I/O  requests  per  channel  and  device 
VIO  requests 


Interference  Pattern 

For  local  balance  models  the  dynamic  interference  is 
irrelevant,  as  the  dynamic  behaviour  of  user  tasks  is  not 
represented.  The  determination  of  the  overhead  and  its 
distribution  among  the  user  tasks  is  relatively  simple  for 
single  class  models,  as  all  jobs  are  assumed  to  be 
statistically  identical.  The  total  overhead  can  be 
distributed  evenly  among  all  user  tasks.  An  easy  way  to 
specify  the  interference  functions  for  the  I/O  operations  is 
then  just  to  determine  an  overhead  factor  that  requires  only 
one  measurement  point: 


overhead  factor 


physical  operations 
logical  service  requests 


For  the  CPU  instructions,  we  define: 


overhead  factor 


total  number  of  instructions 
number  of  problem  state  instructions 
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The  oferhead  factor  then  can  be  applied  to  the  direct 
service  deiand  of  a  user  job  to  specify  the  number  of 
operations  it  causes. 

This  method  of  defining  and  applying  the  interference 
functions  implies  that  the  total  system  overhead  increases 
in  proportion  to  the  multiprogramming  level.  For  each  job 
added  to  the  multiprogramming  level,  an  equal  amount  of 
service  demand  is  brought  into  the  system.  If  measurements 
are  obtained  from  different  workload  situations,  several 
overhead  factors  can  be  calculated  by  the  above  formula,  and 
then  general  interference  functions  depending  on  the 
multiprogramming  level  can  be  derived. 

Particular  cause  and  effect  relationships  are  not 
considered  in  this  method  of  determining  the  interference 
functions.  In  our  model,  all  user  jobs  are  coalesced  into 
one  class,  so  the  user  program  characteristics  as  reflected 
by  the  system  model  do  not  contain  enough  detail  to 
facilitate  the  use  of  any  particular  cause  and  effect 
relationships  for  the  parameter  estimation. 


Hultiproqramminq  aix 


In  a  single  class  model,  the  multiprogramming  mix  has 
only  one  component,  the  average  number  of  users  active  at  a 
time.  Among  the  different  ways  to  determine  this  overall 
multiprogramming  level,  the  most  accurate  way  is  probably 
the  calculation  from  the  measured  active  times: 


HPL 


resident  times 
T 


where  nPl  is  the  multiprogramming  level  and  T  is  the 
measurement  interval.  This  formula  can  be  used  because  the 
resident  times  are  available  from  the  SMF  data. 


Heasurement  Data  Reguirewents 


There  are  several  different  sets  of  neasureiient  data 
froB  uhich  the  system  model  parameters  could  be  specified. 
In  the  subsequent  sections^  the  following  set  will  be  used: 

general  parameters : 
measurement  interval 

jobs  completed  during  the  measurement  interval 
sum  of  tne  resident  rimes  of  all  tasks 

CPD  parameters : 

instructions  under  problem  state 
instructions  under  system  state 
total  busy  time 

I/O  parameters  (for  each  channel  and  disk) : 

utilization 

I/O  operations  (SIOs) 

accounted  I/O  requests  (EXCPs) , 

Spooling  Parameters : 

cards  read 
lines  printed 

memory  and  load  management  paraiaeters: 

pages  transferred  for  swaps 
pages  transferred  for  page  faults 
pages  transferred  for  vIO 

These  measurement  data  form  a  minimal  set  for  specifying 
the  system  model.  Additional  data,  which  can  easily  be 
obtained,  should  be  collected  for  consistency  checks  of  the 
data  in  the  minimal  set  and  for  the  validation  of  the  model. 


4.4.2  Validation  Phase 

Measurements 

The  measurements  for  this  model  were  performed  on  a 
benchmark  that  contains  three  sets  of  jobs:  GPJS  jobs,  HSJS 
jobs,  and  TSO  transactions  simulated  by  a  program  according 
to  a  TSO  command  script.  Unfortunately,  all  25  terminal 
sessions  that  are  simulated  in  the  benchmark  use  the  same 
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script,  so  that  a  soaeifhat  regular  behaviour  of  the  TSO 
interactions  occurred.  Many  interactions  requested  the  same 
resources  at  about  the  sane  tine.  The  monitors  RMF,  SMF, 
and  TORMON  were  used  to  record  the  required  data.  The  RMF 
records  and  the  SHF  records  were  written  to  the  same  data 
set.  For  gathering  the  measurements  used  here,  the  RMF 
sampling  cycle  was  set  to  two  seconds.  The  monitoring 
overhead  caused  by  RMF  was  not  detected  specifically,  but 
from  other  experiments  with  smaller  sampling  cycles,  it  is 
known  to  be  smaller  than  about  three  percent  [SDTH77], 
Before  beginning  the  measurement  experiment,  all  GPJS  jobs 
and  all  HSJS  jobs  were  read  into  the  system,  SHF  was 
activated.  At  the  actual  start  of  the  experiment,  the 
monitors  RMF  and  TORMON  were  activated,  the  benchmark 
streams  were  released,  and  the  TSO  simulator  was  started. 
After  1268  seconds  all  HSJS  jobs  were  processed,  after  1734 
seconds  the  last  TSO  terminal  ’logged  off*,  and  after  2108 
seconds  the  last  GPJS  job  terminated.  These  very  different 
class  elapsed  times,  the  times  during  which  the  classes  are 
active,  caused  great  changes  in  the  workload  of  the  system 
so  that  it  did  not  operate  at  equilibrium. 


Measurement  Data 


The  data  recorded  in  a  benchmark  run,  such  as  the  one 
just  described,  are  more  detailed  than  is  required  for  this 
model.  In  fact,  these  data  will  be  used  to  specify  the 
parameters  for  the  multiple  class  model  in  chapter  5  as 
well.  In  order  to  keep  the  description  for  this  model 
simple,  only  the  relevant  data  are  reported  in  table  3, 

The  utilizations  of  the  CPU  and  the  channels  3  and  4  are 
recorded  by  RMF  as  well  as  by  TORMON,  The  results  compare 
as  follows: 
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RMF 

TORMON 

deviation 

CPU 

.8116 

.  8197 

0.99  % 

Channel  3 

.  2908 

.  2686 

8.27  % 

Channel  4 

.  3041 

.3461 

12. 14  % 

The  deviations  are  normalized  by  the  TORMON  values.  As 
TORMON  records  the  actual  busy  time  as  opposed  to  RMF,  which 
samples  the  system  state,  the  TORHON  values  are  considered 
to  be  more  accurate  than  the  RMF  values* 


Table  3:  Measurement  data  for  the  single  class  model. 


General  Parameters ; 


measurement  interval: 
number  of  jobs  completed: 
sum  of  the  resident  times 


2108  secs 
1314 

9544  secs 


RMF 

SMF 

SHF 


CFO  Parameters: 


instructions  under  problem  state: 
instructions  under  system  state: 
total  CPU  busy  time: 


1. 60*109 

1.98*109 

1728  secs 


(TORMON 

(TORMON 

(TORMON 


I/O  Parameters : 


address 

SIOs 

(RMF) 

EXCPs 

(SMF) 

ut ilizatic 
(RMF) 

channel 

3 

47927 

8261 

.2686 

channel 

4 

72387 

20640 

.3461 

351 

10236 

— 

.  1586 

352 

2355 

— 

.  0482 

353 

7C2 

— 

.0208 

.  354 

12594 

8261 

.2285 

355 

17982 

— 

.3503 

356 

3634 

- 

.  0935 

357 

424 

— 

.0113 

451 

2285 

— 

.0444 

452 

749 

— 

.0104 

454 

8689 

— 

.2266 

456 

9682 

12685 

.1709 

457 

3447 

— 

.0463 

4D0 

5846 

86 

.0973 

4D  1 

31053 

907 

.4117 

4D2 

10098 

6958 

.1161 

pooling  Para 

mete  rs: 

cards  read: 
lines  printed: 


48822 

2.41*105 


(SMF) 

(SMF) 


4c 

4c 


Memory  and  Load  Management  Parameters : 

pages  transferred  for  swaps:  42911 

pages  transferred  for  page  faults:  13494 
pages  transferred  for  VIos:  5869 

*  These  values  are  gathered  by  TORMON, 


(smf: 

(SMF 

(RMF' 
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In  other  benchaarks  perforaed  by  UTCC  staffs  the  deviation 
of  the  RHF  values  froa  the  TOBHON  values  are  reported  to  be 
saaller  than  one  percent  [S0TH77],  The  large  error  in  this 
particular  experiaent  aay  have  been  caused  by  the  larger 
saapling  cycle  (2  seconds  rather  than  250  Billiseconds)  that 
was  chosen  to  keep  the  aonitoring  overhead  saall.  Also, 
1059  saaples  aay  not  provide  sufficient  accuracy,  if  the 
characteristic  of  the  systea's  operation  are  changing  as 
widely  as  in  this  benchaark.  Unfortunately,  we  do  not  have 
TORHON  values  for  the  disks,  so  that  we  have  to  rely  on  the 
disk  utilizations  that  are  given  by  RMF.  In  future 
experiments  smaller  sampling  cycles  should  be  chosen.  The 
utilizations  given  by  RMF  and  by  TORMON  should  always  be 
compared  in  order  to  obtain  some  intuition  about  the 
accuracy  of  the  RMF  data. 


Parameter  Estimation  and  System  Model 


P £551 § J § haviour  Model 


For  the  single  class  model,  an  'average*  job  is  defined 
based  on  all  accounted  resource  usage  of  all  job  classes. 
This  average  job  has  no  meaning  outside  the  modell  ing 
context,  as  it  is  defined  by  calculating  averages  over  jobs 
with  very  diverse  service  demands.  It  is  used  in  the  model 
to  define  the  workload  and  to  provide  a  measure  for  the  job 
throughput,  which  will  be  expressed  in  terms  of  average 
jobs.  The  service  requests  of  an  average  job  are  calculated 
as  the  total  accounted  service  requests  divided  by  the 
number  of  jobs  completed.  They  are  listed  in  table  4, 
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Table  4:  Service  requests  per  job  on  average. 


CPU  (problem  state  instructions) 

Channel  3 

channel  4 

disk  354 

dis)c  456 

disk  4D0 

disk  4D1 

disk  4D2 

VIO  EXCPs 

cards 

lines 


1.  121*10« 
6.29 
15.71 
6.29 
9.65 
0.065 
0.69 
5.  30 
9.95 
37.2 
182.6 


Interference  Pattern 


The  CPU  overhead  factor^  the  ratio  of  the  total  number 
of  instructions  to  the  number  of  problem  state  instructions 
is  2.24.  The  total  number  of  CPU  instructions  per  job  is 
2. 51*10^.  From  table  3  we  see  that  for  most  of  the  disks  we 
have  only  data  about  the  physical  operations  (SIOs)  ,  |DUt  no 
data  about  the  service  requests  (EXCPs) .  So  it  is  not 
possible  to  define  an  overhead  factor  for  these  disks.  For 
seme  of  the  disks,  we  know  the  data  sets  they  contain  (see 
table  2) .  So  we  can  at  least  establish  some  ratios  between 
service  requests  and  physical  I/O  operations. 

The  disk  packs  354  and  456  contain  only  user  data  whose 
service  requests  are  accounted.  So  we  can  determine 
SIO/EXCP  ratios: 

disk  354:  1.56 
disk  456:  0.76 

These  ratios  reinforce  the  fact  that  an  EXCP  may  cause  more 
than,  or  less  than,  one  SIO. 


Disk  351  is  the  spooling  disk.  A  total  of  289419 
accounted  spooling  records  results  in  10236  SIOs.  The  ratio 
of  spooled  records  to  SIOs  is  28.3.  The  overhead  ratio  is 
0.0354,  This  ratio  may  be  different  for  the  cards  read  and 
the  lines  printed  due  to  different  blocking  factors.  Eut  as 
we  have  no  indication  about  these  factors, 
equally. 


we  treat  them 
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Disk  356  contains  the  swap  data  set,  A  total  of  42911 
page  transfers  result  in  only  3634  SIOs,  The  ratio  of  page 
transfers  to  SIOs  is  11,8,  The  overhead  ratio  is  0,0847. 

All  other  disks  contain  either  system  data  sets  and  user 
data  sets  together  or  system  data  sets  whose  usage  cannot  be 
related  easily  to  accounted  service  reguests. 


Hultiproqramming  level 


From  the  sum  of  the  resident  times  and  the  measurement 
interval,  the  mean  multiprogramming  level  is  calculated  to 
be  4,53  average  jobs.  Because  the  average  multiprogramming 
level  is  a  fraction,  the  model  is  evaluated  for  the  closest 
integer  multiprogramming  levels  (4  and  5) ,  and  the  result  is 
obtained  by  linear  interpolation. 


Model  Workload 


The  mcdel  workload  specifies  the  number  of  physical 
operations  per  job  performed  on  average  on  each  of  the 
devices  represented  in  the  gueueing  network  model.  To 
determine  the  model  workload,  we  multiply  the  overhead 
ratios  by  the  service  requests.  For  the  system  components 
for  which  we  have  no  such  ratios,  we  simply  divide  the 
recorded  physical  operations  by  the  number  of  jobs.  This 
shortcut  could  certainly  be  used  for  all  servers  of  the 
model.  However,  if  changes  must  be  made  to  the  model,  it  is 
advantageous  to  have  as  much  detailed  informaticn  cn  the 
system  as  possible  represented  in  the  system  model.  Thus  it 
should  be  attempted  to  build  a  detailed  program  behaviour 
mcdel  and  to  establish  the  interference  pattern  in  as  much 
detail  as  the  measurement  data  permit.  The  model  workload 
in  physical  operations  per  job  on  average  is  listed  in  table 


5. 
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Table  5: 

Model  workload 

for  the  single 

class  mo 

CPU 

disk  451 

1.74 

channel  3 

36.47 

disk  452 

channel  4 

55.09 

disk  453 

0.  33 

disk  351 

1.19 

disk  454 

6.62 

disk  352 

1 .79 

disk  456 

7.37 

djsk  353 

0.53 

disk  457 

2.62 

disk  354 

9.58 

disk  4D0 

4.45 

disk  355 

13.68 

disk  4D1 

23.63 

disk  356 

2.77 

disk  4D2 

7.69 

disk  357 

0.32 

Resource  Attributes 

The  CPO  and  the  channel  capacities  are  calculated 
according  to  the  fcrmula  given  in  the  description  of  the 
design  phase.  In  order  to  obtain  the  disk  capacities^  we 
calculate  the  nean  tine  of  an  operation  for  each  disk  and 
each  channel  individually.  To  account  for  the  overlapped 
activity  of  disks  and  channels,  the  mean  channel  operation 
time  is  subtracted  from  the  mean  disk  operation  time.  The 
adjusted  disk  capacities  are  the  inverse  of  these  shortened 
disk  operation  times.  Thus  the  overlapping  time  is  modelled 
as  being  spent  at  the  channel,  and  the  time  a  task  occupies 
only  the  disk  is  modelled  as  being  spent  at  the  disk.  The 
capacities  of  all  modelled  components  are  listed  in  table  6. 


Table  6:  Capacities  of  the  modelled  hardware  components. 


CPO 

2.07*10® 

disk 

451 

32. 

.20 

channel  3 

84.65 

disk 

452 

51, 

,56 

channel  4 

99.22 

disk 

453 

27. 

.80 

disk 

351 

47.58 

disk 

454 

21, 

.78 

disk 

3  52 

31.74 

di  sk 

456 

36, 

,60 

disk 

353 

18.80 

disk 

457 

54, 

,45 

disk 

354 

37.56 

disk 

4D0 

39, 

,76 

disk 

355 

33.96 

disk 

4D1 

55, 

,53 

disk 

356 

23.43 

disk 

4D2 

70. 

.11 

d  isk 

357 

16.07 

The  values  are  operations  per  second. 


As  the  disk  capacities  are  calculated  using  the 
utilizations  given  by  BMP,  they  may  contain  large  errors,  in 
Particular  for  the  lightly  utilized  disks.  However,  errors 
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at  lightly  utilized  devices  do  not  introduce  much  error  into 
the  model.  The  large  variation  in  the  disk  capacities  is 
due  to  different  access  patterns  that  result  in  different 
average  seek  times. 


Queueing  Network  Model 


Because  some  of  the  disks  are  heavily  utilized^  we 
represent  the  disks  in  the  model  as  well  as  the  channels. 
The  structure  of  the  queueing  network  model  is  shown  in 
figure  3.  The  relative  utilizations  required  as  input 
parameters  are  calculated  from  the  capacities  and  the  model 
workload.  For  the  computation  of  the  performance  measures, 
the  program  described  in  [KSS  77]  is  used. 


Figure  3.  Queueing  Network  Model 
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Performance  Measures 


The  results  of  the  model  are  listed  in  table  7  below 
together  with  the  corresponding  measurement  data  and  the 
relative  errors.  The  disk  performance  measures  resulting 
from  the  model  have  only  limited  significance,  because  the 
disk  operations  in  ,the  model  do  not  directly  correspond  to 
disk  operations  in  the  actual  system.  Therefore,  they  are 
not  listed.  As  the  monitors  do  not  record  gueue  lengths, 
the  queue  lengths  resulting  from  the  model  cannot  be 
validated. 


Table  7:  Results  of  the  single  class  model. 

model  measurement  rel,  error 


results 

data 

(in  %) 

CPU 

util ization 
mean  queue  length 

0.8260 

1. 9789 

0.8197 

♦  0.76 

Channel  3 
utilization 
mean  queue  length 

0.2706 

0.3499 

0.2686 

♦  0.74 

Channel  4 
utilization 
mean  queue  length 

0.3487 

0.4887 

0. 3461 

♦  0.75 

job  throughput 

1324.2 

1314 

♦  0.76 

The  results  of 

the  model 

agree  very 

well  with  the 

corresponding  measurements.  This  suggests  that  an 
appropriate  level  of  detail  with  respect  to  the  required 
performance  measures  was  chosen  for  the  model. 
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5*.  A  Hierarchical  Model  with  Several  Classes  of  Customers 


In  this  chapter,  we  create  a  more  realistic  and  more 
complex  model  of  the  UTCC  computer  system.  It  is  comparable 
in  complexity  with  models  used  in  recently  published  case 
studies  [BCSE76,  GIAM76,  NEIL76,  KGT  77], 


The  objectives  of  this  validation  study  are: 


•  to  apply  and  to 
previous  chapter 


test  the  methodology  developed  in 
and 


the 


•  to  investigate  the  problems  of  practical  modelling 
using  inexpensive  modelling  techniques  and  commonly 
available  monitoring  tools. 


5.  1  Design  Phase 


5.1.1  Performance  Measurements 

The  system  related  performance  measures  we  want  to 
calculate  are  the  system  throughput  and  the  utilizations  and 
the  mean  queue  lengths  of  the  major  hardware  components  of 
the  system.  Of  the  user  related  performance  measures,  we 
want  to  obtain  the  mean  time-in-system  of  the  batch  tasks 
and  the  mean  response  time  of  the  TSO  interactions. 

The  mean  time-in-system  of  the  coalesced  batch  classes 
has  a  very  high  variance  due  to  the  widely  varying  service 
demands.  Thus  it  is  not  a  meaningful  performance  measure 
for  the  user.  He  can  reduce  this  variance  considerably  by 
modelling  the  two  batch  classes  separately.  Multiple  class 
models  are  very  important  in  that  they  provide  meaningful 
user  oriented  as  well  as  system  oriented  performance 
measures.  Single  class  models  can  give  only  system  oriented 
performance  measures  when  the  job  classes  have  very  distinct 
characteristics. 
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A  further  reduction  of  the  variance  could  be  achieved  by 
a  special  analysis  technique  that  conditions  the  nean  tine- 
in-systei  on  the  actual  service  demand  of  a  task  [LAZ077b]. 


5.1.2  Queueing  Network. Hodel 


The  queueing  network  model  must  contain  explicitly  at 
least  as  much  detail  as  the  desired  performance  measures 
require.  This  means  that  it  must  distinguish  the  three 
classes  GPJS,  HSJS,  and  TSO.  Also,  the  major  hardware 
components  of  the  system  must  be  represented. 

To  obtain  the  mean  time-in-system  tines  of  the  batch 
classes,  we  use  a  queueing  network  model  on  the  operating 
system  level.  The  throughput  rates  of  the  TSO  class  of  this 
model  are  used  as  service  rates  in  a  birth-death  model  that 
is  employed  to  calculate  the  mean  response  time  for  TSO 
interactions  at  the  user  interface  level. 

In  order  to  keep  the  solution  of  the  models  simple  and 
inexpensive,  we  will  use  a  local  balance  queueing  network 
model  and  a  general  birth-death  model  [KLEI75]. 

It  would  be  desirable  to  model  the  I/O  subsystem  by  load 
dependent  servers.  However,  because  the  measurements 
required  for  the  determination  of  the  load  dependent  service 
rates  would  be  very  expensive,  we  restrict  ourselves  to  load 
independent  servers. 

Only  hardware  resources  will  be  represented  explicitly 
in  the  queueing  network  model.  The  demand  for  software 
resources  such  as  paging  or  swapping  will  be  covered 
implicitly  as  demand  on  the  hardware  resources. 

Hodel  Type  =  hierarchical 

a)  general  network 
3  classes 

local  balance  disciplines 

workload  vectors 

load  independent  servers 
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b)  birth-death  nodel 
single  class 
FCFS 

exponential,  aean 
load  dependent  server 

Modelling  level:  a)  operating  systea  level, 

logical  approach, 
b)  user  interface  level 

Input  Paraaeter  Requirements: 

a)  number  of  servers,  M 
number  of  classes,  R 
multiprogramming  mix 
relative  uitlizations  Xir 

1  ~  r  —  1,«««,R: 

b)  mean  think  time 

mean  number  of  active  terminals 
maximal  number  of  concurrently  active 
TSO  requests.  Max 
load  dependent  service  rates,  u (s) 
s  =  1, . • . , Max; 


5.1.3  System  Model  of  the  Queueing  Network  Model 


Resource  Attributes 


In  the  multiple  class  model,  the  same  hardware  resources 
as  in  the  single  class  model  are  represented.  The 
capacities  are  assumed  to  be  class  independent.  This 
assumption  may  not  hold  in  reality,  as,  for  example,  there 
is  evidence  that  the  high  speed  cache  hit  ratio  is  class 
dependent  [SUTH??].  Because  no  measurement  data  are 
available  that  would  allow  the  specification  of  class 
dependent  capacities,  we  use  the  class  independent 
capacities  of  the  single  class  model  as  approximation. 

To  determine  the  CPU  capacity,  we  also  consider  the  high 
speed  cache  hit  ratio,  which  is  the  proportion  of  the 
instructions  that  access  only  the  high  speed  cache  memory. 
This  ratio  can  influence  the  CPU  capacity  considerably. 

The  actual  disciplines  used  by  the  SRM  are  difficult  to 
determine  and  virtually  impossible  to  model  in  detail.  They 
will  be  approximated  by  the  PS  discipline  for  the  CPU  and  by 
the  FCFS  discipline  for  all  other  components.  Because  only 
the  hardware  resources  will  be  represented  in  the  queueing 
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network  nodel,  no  disciplines  (and  also  no  capacities)  need 
be  specified  for  swapping*  paging*  and  VIO, 


Model  Workload 


The  relative  workload  for  the  servers  of  the  queueing 
network  acdel  must  be  specified  by  class.  The  four  levels 
of  detail  defined  in  4,4.1  can  be  applied  here  as  well.  let 
L(i,r)  be  the  total  workload  placed  on  server  i  by  tasks  of 
class  r.  The  relative  workload  W  (i,r)  of  server  i  due  to 
class  r  can  be  defined  as: 

a)  total  workload:  H  (i*r)  =  L(i*r) 

b)  load  per  partition:  H(i*r)  =  L(i*r)  /  MPL(r) 

c)  load  per  task:  W(i*r)  =  L(i*r)  /  Jobs(r) 

d)  load  per  cycle:  W{i*r)  =  L(i*r}  /  Cycles  (r) 

HPL(r):  mult iprograiBBiing  level  of  class  r 

Job£(r):  nufflber  of  class  r  tasks  completed 

Cycles (r):  cycles  through  the  network  by  tasks 
of  class  r  (=  total  number  of  I/O 
requests  by  tasks  of  class  r) 

The  workload  vector  has  now  actually  two  dimensions.  It 
contains  the  relative  workload  for  each  class  at  each 
device.  As  pointed  out  in  section  4,4,1*  these  definitions 
are  equivalent  for  modelling  purposes.  The  idea  of  the 
workload  vector  for  multiple  class  models*  on  which  the 
above  considerations  are  based*  is  developed  in  [GIAM76b], 


Program  Behaviour  Model 


For  queueing  network  models  using  local  balance  solution 
techniques*  no  dynamic  behaviour  needs  to  be  represented  in 
the  system  model.  For  all  resources  of  the  system  model  the 
mean  usage  per  task*  broken  down  by  classes*  must  be 
specified . 
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resources 

CPU 

file  I/O 
VIC 


workload  dimension 

instructions 

I/O  requests  per  channel  and  device 
VIO  requests 


Interference  Pattern 


Because  the  aodel  is  assumed  to  be  in  local  balance,  no 
dynamic  interference  has  to  be  considered. 

The  determination  of  how  much  overhead  is  caused  by  what 
class  is  very  important  for  multiple  class  models.  If  no 
changes  to  the  operating  system  are  to  be  modelled,  it  is 
net  crucial  to  determine  the  operating  system  usage  of  the 
resources  accurately  as  long  as  as  it  is  attributed  to  the 
job  class  that  causes  it.  SMF,  for  example,  does  not 
distinguish  between  CPU  usage  by  a  user  task  and  the 
overhead  it  caused.  All  overhead  whose  immediate  cause  can 
be  recognized  is  accounted  to  the  task  that  causes  it. 
However,  it  is  essential  that  the  unaccounted  resource  usage 
be  credited  to  the  classes  in  the  correct  proportions.  Due 
tc  the  limited  time  available  for  monitoring,  no  special 
measurements  could  be  performed  to  establish  cause  and 
effect  relationships.  Instead,  assumptions  are  made  about 
the  cause  of  the  unaccounted  resource  usage.  A  breakdown 
ratio  for  distributing  unaccounted  operations  among  the  user 
tasks  will  be  established  in  the  validation  phase. 


Multiprogramming  flix 


Because  the  three  different  job  streams  finish  at 
different  times,  three  phases  of  the  benchmark 
multiprogramming  mix  can  be  distinguished.  During  the  first 
part  of  the  benchmark,  all  three  classes  are  active.  During 
the  second  part,  GPJS  and  TSO  are  active,  and  during  the 
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third  part,  only  GPJS  is  active.  These  very  large  end 
effects  of  the  benchnarJc  aight  also  cause  probleas  in  the 
specification  of  other  systea  aodel  paraaeters.  For  the  TSO 
class  we  do  not  need  to  specify  a  particular 

au Iti prog raaming  level.  All  possible  aultiproqramming 
levels  are  evaluated,  and  the  performance  measures  are 

calculated  by  the  birth-death  model  on  the  user  interface 

m 

level , 


5.1.4  System  Model  of  the  Birth-Death  Model 

Resource  Attributes 

The  server  of  the  birth-death  model  represents  the 
entire  computer  system.  The  SRM  schedules  jobs  according  to 
a  priority  algorithm  that  is  designed  to  balance  the  overall 
load  of  the  system.  However,  in  the  benchmark  all  TSO  tasks 
have  the  same  characteristics,  so  the  FCFS  discipline  is  a 
good  approximation.  The  throughput  rates  of  the  queueing 
network  model  at  different  multiprogramming  mixes  are  used 
as  the  load  dependent  capacities  for  the  corresponding 
multiprogramming  levels  of  the  birth-death  model. 


Model  Workload 

The  model  workload  is  specified  in  terms  of  the  mean 
number  of  active  terminals. 


Program  Behaviour  Model 


The  program  behaviour  model  for  the  birth-death  model  is 
very  simple.  The  tasks  circulate  between  the  terminals  and 
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the  computer  system.  The  mean  thinJe  time  is  the  only 
parameter  that  has  to  be  specified.  The  service  demand  is 
one  interaction  per  cycle. 


Interference  Pattern 


The  swapping  cf  temporarily  inactive  tasks  and  the  load 
balancing  could  be  modelled  as  dynamic  interference. 
However,  to  keep  the  model  simple,  this  will  not  be  done. 
All  overhead  is  covered  by  the  model  on  the  operating  system 
level,  so  no  interference  functions  have  to  be  specified  for 
the  birth-death  model. 


Multiprogramming  Mix 


The  parameter  of  the  birth-death  model  that  corresponds 
to  the  multiprogramming  mix  in  the  gueueing  network  model  is 
the  mean  number  of  active  terminals.  Another  related 
parameter  is  the  maximum  multiprogramming  level,  which 
restricts  the  number  of  service  rates  required. 


Resource  Attributes 


The  resource  attributes  of  the  birth-death  model  are  the 


lead  dependent  service  rates. 
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5t 1. 5  Heasureient  Data  Reqqirements 


General  Paraaeters : 

Beasurement  interval  length 

tasks  conpleted,  by  class 

SUB  of  the  active  tines,  by  class 

terBinals  active 

Bean  think  tine 


CPD  Paraaeters; 

accounted  CPO  tine,  by  class 
total  nuBber  of  instructions 
total  busy  tine 
high  speed  cache  hit  ratio 
high  speed  cache  access  tine 
aain  aemory  access  tiie 


I/O  Parameters : 

I/O  requests  per  class  and  device  (EXCPs) 

I/O  requests  per  device  (SIOs) 

busy  times  of  the  devices  and  the  channels 


Spooling  Paraceters : 

cards  read-  by  class 
lines  printed,  by  class 


Hemory  and  load  Managenent  Paraaeters : 

swaps,  by  class 

pages  transferred  for  swaps,  by  class 
pages  transferred  for  swaps,  total 
pages  transferred  for  page  faults,  by  class 
pages  transferred  for  page  faults,  total 
pages  transferred  for  VIO,  by  class 
pages  transferred  for  VIO,  total 


5.2  Validation  of  the  Queueing  Network  Model 

5.2.1  Measurements 

The  measurement  data  for  this  model  are  taken  from  the 
benchmark  experiment  that  is  described  in  section  4.4.2. 
This  time,  more  detailed  data  are  extracted  from  the 
recorded  measurements. 
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Heasureoent  Data 


The  leasureaent  data  for  the  aultiple  class  aodel  are 
shown  in  table  8.  Note  that  SHF  records  the  CPO  time  spent 
under  Systea  Bequest  Blocks  (SBBs)  and  the  CPO  time  spent 
under  Task  Control  Blocks  (TCBs) •  Both  variables  contain 
the  problea  CPO  tiae  and  soae,  but  not  all,  of  the  system 
state  time.  Because  more  of  the  overhead  time  is  accounted 
under  the  TCBs,  the  TCB  CPO  time  is  chosen  as  the  accounted 
CPO  tiae. 


Table  8:  Measurement  data  for  the  multiple  class  model. 


General  Paraaeters : 

measurement  interval:  2108  secs  (BMF) 


GPJS 

TSO 

HSJS 

Source 

jobs  completed 

56 

950 

308 

(SMF) 

total  resident  tiae 
(in  seconds) 

4564 

4252 

1268 

(SMF) 

job  class  elapsed 
time  (secs) 

2108 

1734 

1268 

(SHF) 

aulti programming 
level  (DfiMPL  ♦) 

2.  43 

2.  18 

0.59 

(BMF) 

average  nuaber  of 
ready  tasks  (DMBOA  ♦) 

4.  84 

8.  18 

1.0 

(BMF) 

CFO  Parameters: 

busy  tiae  1728  secs 

problem  state  time  774  secs 

system  state  time  954  secs 

total  instructions  3.582*10'* 

high  speed  cache  accesses  5.041J10^ 

high  speed  cache  hit  ratio  .9555 

utilization  .8143 

[TOBMON] 
TOBMON 
TOBMON 
TOBMON 
TOBMON' 
TOBMON 
BMF,  C( 

:VOTILP  ♦ 

accounted  CPO  tine  (under 
GPJS:  784.9  secs  (SHF 
TSO:  214.4  secs  (SHF 
HSJS:  306.2  secs  (SHF 

TCBs) 

♦  internal  operating  system  variables, 
BMF  trace  option 


obtained  by  the 
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Table  8  continued. 


I/O  Paraieters : 


address 

SIOs 

(fiHF) 

GPJS 

EXCPs 

(SMf) 

TSO 

HSJS 

ut ilizati 
(HHF) 

channel  3 

47927 

8261 

— 

— 

.2686 

channel  4 

72387 

17649 

1324 

1666 

.3461 

disk 

351 

10236 

— 

— 

— 

.1586 

disk 

352 

2355 

M 

— 

— 

.  0482 

disk 

353 

702 

— 

— 

— 

.0208 

d  isk 

354 

12594 

8261 

— 

— 

.2285 

disk 

355 

17982 

— 

— 

— 

.3503 

disk 

356 

3634 

— 

— 

— 

.0935 

disk 

357 

424 

— 

— 

— 

.0113 

d  isk 

451 

2285 

— 

— 

— 

,0444 

d  isk 

452 

749 

— 

— 

— 

.0104 

disk 

453 

434 

— 

— 

— 

.  0094 

d  isk 

454 

8689 

— 

— 

— 

.  2266 

disk 

456 

9682 

12685 

— 

— 

.  1709 

disk 

457 

3447 

— 

— 

— 

.0463 

disk 

4D0 

5846 

86 

— 

- 

.0973 

disk 

4D1 

31053 

— 

— 

907 

.4117 

disk 

4D2 

10098 

4874 

1325 

759 

.1161 

Spooling  Parameters: 

GPJS 

TSO 

HSJS 

source 

cards 

:  read 

15352 

925 

32545 

(SMF) 

lines 

printed  184517 

2300 

53780 

(SMF) 

Memory  and 

Load  Management  Parameters: 

GPJS 

TSC 

HSJS 

source 

s  vaps 

93 

1361 

1 

(SMF) 

pages  transferred 
tor  swaps 

51  84 

37666 

61 

(SMF) 

pages  transferred  for 
address  space  page 
faults 

2390 

10279 

825 

(SMF) 

pages  transferred  for 
common  area  page 
faults 

1781 

90 

2 

(SHF) 

VIO-EXCPs 

12836 

150 

83 

(SMF) 

VIC-SIOs  (total) 

5869 

(RMF) 

5.2,2  Parameter  Estination 


Most  of  the  effort  in  the  paraneter  estimation  is 
required  for  calculating  the  number  of  physical  operations 
by  class  and  device  from  the  accounted  service  requests. 
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Program  Behaviour  Model 


The  service  demand  on  each  service  facility  by  job  of 
each  class  on  average  is  calculated  from  the  accounted 
service  requests.  Because  the  service  demand  of  jobs 
belonging  to  the  same  class  are,  at  least  to  a  certain 
degree,  homogeneous,  these  •’average"  jobs  can  be  viewed  as 
prototype  jobs  for  their  class,  and  their  performance 
measures  have  meaning  also  to  the  users.  The  service 
requests  of  the  three  average  jobs  are  listed  in  table  9. 
The  numbers  of  CPU  operations  requested  are  calculated 
assuming  a  homogeneous  instruction  rate.  In  practice  the 
service  rate  may  be  class  dependent  due  to  differing  high 
speed  cache  hit  ratios.  However,  there  are  no  measurement 


data  available  to 

Table  9:  Program 
model . 

quantify 

behaviour 

GPJS 

these  ratios  by  cla 

model  for  the  multi 

TSO  HSJS 

number  of  tasks 

56 

950 

308 

CPU  ♦ 

29.058 

.  468 

1.186 

channel  3 

147.51 

— 

— 

channel  4 

315.16 

1.394 

5.409 

disk  354 

147. 51 

— 

— 

disk  456 

222.52 

— 

— 

disk  4D0 

1.536 

— 

— 

disk  4D1 

— 

— 

2.945 

disk  4D2 

87.035 

1.  395 

2.464 

pages  transferred 

for  VIOs 

229.21 

0.  158 

0.269 

cards  read 

274 

1 

106 

lines  printed 

3735 

92 

174 

♦  in  millions  of  instructions 

The  values  are  service  requests  per  average  job. 


Interference  Pattern 


In  determining 
important  modelling 
breakdown  of  usage 


the  interference  pattern,  the  most 
decisions  are  made.  An  accurate 
by  job  class  is  not  available  for  any 
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systea  resource.  Instead,  data  related  to  this  breakdown 
are  available  froi  which,  in  soae  cases,  conclusions  about 
the  actual  breakdown  can  be  drawn.  For  cases  where  this  is 
not  possible,  a  general  breakdown  ratio  will  be  defined. 
Inforaation  about  the  cause  and  effect  relationships  in  the 
systea  is  not  sufficient  to  allow  the  definition  of  actual 
overhead  functions  th'at  would  distribute  the  unaccounted 
resource  usage  depending  on  aeasurable  system  variables. 
However,  the  ratios  of  physical  operations  to  accounted 
service  reguests  established  for  the  single  class  model  hold 
for  the  multiple  class  model  as  well. 

Spooling  Disk  (351): 

The  SIOs  of  the  spooling  disk  are  distributed  among  the 
classes  according  to  the  numbers  of  cards  read  and  lines 
printed. 

Local  Page  Data  Sets (355,  454): 

The  SIOs  are  attributed  to  the  classes  according  to  the 
number  of  pages  transferred  for  address  space  page 
faults  and  VIO-EXCPs.  With  respect  to  disk  454,  it  is 
assumed  that  half  of  its  usage  is  due  to  paging,  and  the 
other  half  is  due  to  accesses  to  the  master  catalog. 
This  second  half  is  distributed  according  to  the  general 
breakdown  ratio  that  will  be  defined  later. 

Common  Area  Page  Data  Sets  (453): 

The  SIOs  are  distributed  among  the  classes  according  to 
the  number  of  common  area  page  transfers  reported  by 
SHF. 

Swap  Data  Set  (356)  : 

The  SICs  are  distributed  according  to  the  number  of  page 
transfers  for  swaps.  The  single  swap  for  HSJS  is 
neglected. 

Disks  containing  only  user  data  sets  (354,  456)  : 

On  these  disks,  EXCPs  are  only  accounted  for  GPJS.  So 
all  SICs  are  attributed  to  GPJS. 
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Disks  containing  user  and  systea  data  sets  (4D0,  4D1,  4D2) : 
For  the  user  data  sets,  a  SIO/E1CCP  ratio  of  one  is 
assumed.  The  remaining  SIOs  on  disk  4D1  are  attributed 
to  TSO,  because  this  disk  contains  the  TSO  catalog.  The 
remaing  SIOs  on  disk  4D2  are  distributed  according  to 
the  general  breakdown  ratio. 

General  Breakdown  Ratio: 

The  general  breakdown  ratio  is  defined  somewhat 
arbitrarily,  as  there  is  no  guantifiable  evidence  of  how 
it  should  be  defined.  The  following  rationale  nay  be  as 
good  as  any  other.  Half  of  the  resource  usage  that 
cannot  be  distributed  among  the  classes  otherwise  is 
distributed  according  to  I/O  related  parameters  (the  sum 
of  EXCPs,  VIOs,  page  transfers  for  swaps  and  page 
faults).  It  is  assumed  that  much  of  the  unaccounted 
system  activity  is  associated  with  these  operations. 
The  other  half  is  distributed  according  to  the  number  of 
jot  step  initiations,  because  the  initiatiors  are  viewed 
to  be  another  important  cause  of  overhead.  The 
resulting  breakdown  ratio  is: 

GPJS:  ,276 

TSO:  .600 

HSJS:  .124 

After  applying  this  breakdown  ratio  to  the  numbers  of 
CPU  instructions,  we  can  establish  a  ratio  of  accounted 
operations  to  total  operations  (assuming  a  homogeneous 
instruction  rate)  of  each  job  class: 

GPJS:  .871 

TSO:  .458 

HSJS:  ,854 


105 


Holtiprogramainq  Mix 

The  average  Bultiprograaa ing  levels  for  GPJS  and  for  TSO 
are  deternined  froa  the  sun  of  the  resident  tines.  The  HSJS 
jobs  can  have  at  most  a  nultiprogranning  level  of  one.  So 
their  average  nultiprogranning  level  can  be  calculated  as 
the  class  elapsed  tine  divided  by  the  neasurenent  interval 
length.  The  resulting  nultiprogranning  nix  is: 

GPJS:  2.1 
TSO:  2.0 
HSJS:  0.6 

Because  the  nultiprogranning  mix  reported  by  BHF  is 
determined  by  sampling,  it  is  not  considered  to  be  as 
accurate  as  the  above  values.  Using  the  mean 
multipcogramning  nix  ignores  the  large  variation  in  the 
multiprogramming  levels  of  the  GPJS  jobs  and  the  TSO 
interactions.  It  also  ignores  the  heavy  interaction  between 
the  multiprogramming  levels  due  to  the  load  balancing 
efforts  of  the  SRH.  However,  considering  the  information 
available,  this  is  the  best  approximation  we  can  make. 


Model  Workload 


From  the  program  behaviour  model  and  the  interference 
pattern,  the  model  workload  is  calculated  as  the  average 
number  of  physical  operations  per  server  and  class.  The 
results  are  presented  in  table  10. 


106 


Table 

10: 

Model  workload 

for  the 

multiple  cla 

address 

GPJS 

TSO 

HSJS 

CPU  ♦ 

33.33 

1.02” 

2T4I 

chann 

el  3 

488.7 

16.4 

1  6.  3 

channel  4 

404.0 

48. 1 

13.0 

disk 

351 

126.1 

0.12 

9.  91 

disk 

352 

11.6 

1.48 

0.  95 

disk 

353 

3.5 

0.443 

0.28 

disk 

354 

224.9 

d  isk 

355 

112.7 

10.7 

5.0 

d  isk 

356 

7.8 

3,4 

disk 

357 

2.1 

0.27 

0.  17 

disk 

451 

11.3 

1.4 

0,93 

disk 

452 

3.7 

0.47 

0.  30 

disk 

453 

1  .8 

0.34 

0.  055 

disk 

454 

66.1 

4.9 

1,  3 

d  isk 

456 

172.9 

disk 

457 

17.0 

2.  2 

1.4 

disk 

4D0 

28.8 

3.7 

2,  4 

disk 

4D1 

— 

31.7 

2.  9 

disk 

4D2 

281.1 

3.4 

3.7 

♦  in 

mini 

ons  of  instruct 

ions 

The  V 

alues 

are  operations 

per  job 

• 

Resource  Attributes 


As  stated  in  the  validation  phase,  we  use  the  capacity 
values  of  the  singe  class  nodel  (see  table  6) •  For  the  CPU, 
however,  we  want  to  represent  the  dependence  of  the  capacity 
on  the  high  speed  cache  hit  ratio  that  was  0.9555  in  the 
benchnark  experiment.  The  access  times  are  80  nanoseconds 
for  the  high  speed  cache  and  2  microseconds  for  the  main 

■V 

memory.  Each  instruction  performs  on  the  average  1.4  memory 
accesses.  For  every  access,  first  the  high  speed  cache  is 
searched,  and  only  if  the  reguired  address  is  not  there,  the 
main  memory  is  accessed.  We  can  calculate  the  mean  memory 
access  time  per  instruction  as  follows: 


la  =  1 . 4  ♦  (Ic  ♦  Im  ♦  (1  -  P)  ) 
=  0.2380  microseconds 


la:  mean  access  time  per  instruction 

Ic:  access  time  to  the  high  speed  cache 
Im:  access  time  to  the  mam  memory 
P:  high  speed  cache  hit  ratio 


The  total  mean  instruction  time  of  0.482  microseconds 
can  now  be  split  into  the  mean  memory  access  time  and  the 
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mean  execution  time  of  0*244  oicroseconds*  With  this 
breahdovn  and  the  above  formula,  the  CPU  capacity,  the 
inverse  of  the  mean  instruction  tine,  can  be  determined  as  a 
function  of  the  high  speed  cache  hit  ratio. 


5*2*3  Paraieter  Estination  and  Queueing  Network  Hodel 


The  structure  of  the  queueing  network  model  is  the  same 
as  for  the  single  class  model  (see  figure  3) *  The  input 
parameters  required  are  the  multiprogramming  mix  and  the 
relative  utilization  of  each  device  broken  down  by  job 
class* 


5.2.4  Solution  and  Performance  Measures 


The  model  is  solved  using  the  program  described  in 
[ KSS  77]*  In  order  to  obtain  the  performance  measures  for 
the  average  multiprogramming  mix,  the  model  is  evaluated 
under  the  neighbouring  integer  multiprogramming  mixes,  and 
the  results  are  linearly  interpolated.  The  linear 
interpolation  seems  to  be  a  good  approximation,  since  the 
order  in  which  classes  are  chosen  to  do  interpolation  does 
not  affect  the  results* 

The  results  of  the  model  together  with  the  corresponding 
measurement  data  and  the  relative  errors  where  applicable 
are  shown  in  tables  11  and  12*  The  disk  performance 
measures  are  not  listed*  They  are  not  very  meaningful  due 
to  the  special  calculation  of  the  disk  capacities* 
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Table  11:  Utilizations  and  throughputs  of  the  multiple  class 
model. 


utilizations  throughput 


CPU 

CH  3 

CH  4 

GPJS 

TSO 

HSJS 

model 

results 

.  8494 

.2808 

.3693 

57,3 

1042 

303 

measurement 

data 

.8197 

.2686 

.3461 

56 

950 

308 

relative 
error  (%) 

♦  3.6 

♦  4.4 

♦  6.7 

♦  2.4 

♦  9.7 

-1,6 

The  throughput  values  are  jobs  per  second. 


Table  12:  Mean  queue  lengths  of  the  multiple  class 
model • 

mean  queue  lengths 


GPJS 

TSO 

HSJS 

Total 

CPU 

1.0552 

0.6526 

0.4168 

2.  1246 

Channel 

3 

0.2020 

0.  1262 

0.0375 

0.3657 

Channel 

4 

0.  162  5 

0.3215 

0. 0288 

0. 5128 

Assuming  FCFS  dispatching  disciplines,  the  mean  resident 
times  can  be  calculated  as  the  inverse  of  the  throughput 
times  the  multiprogramming  level  of  a  class: 

GPJS:  81.5  secs 

HSJS:  3.4  secs 

The  response  time  of  the  TSO  interactions  will  be  determined 
by  the  birth-death  model. 

These  results  seem  to  indicate  that  the  cverall  resource 
usage  is  represented  more  or  less  accurately,  but  that  too 
much  of  the  overhead  is  attributed  to  HSJS  and  too  little  to 
TSO.  The  general  breakdown  ratio  is  a  critical  parameter 
for  this  model,  A  small  experiment,  whose  results  are  shown 
in  table  13,  can  show  this. 
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Table  13:  Error  calculation  for  different  breakdown  ratios. 


breakdown  ratio 

GPJS  TSO  HSJS 

relative 

GPJS 

throughput 

TSO 

error 

HSJS 

.46  5 

.  508 

.027 

-12.7 

♦  23.5 

♦  21.9 

.365 

.608 

.027 

-6.9 

♦  9.5 

♦  23.0 

.276 

.600 

.124  ' 

♦  2.4 

♦  9.7 

-1 .6 

The  first  ratio  is  derived  by  distributing  the  unaccounted 
resource  usage  only  according  to  the  I/O  related  data. 
Apparently,  too  much  overhead  is  attributed  to  GPJS,  and  not 
enough  to  TSO  and  HSJS.  A  shift  of  10%  of  the  overhead  in 
the  ratio  from  GPJS  to  TSO  improved  the  results  for  GPJS  and 
for  TSO.  However,  the  errcr  for  HSJS  becomes  even  larger. 
The  third  ratio,  derived  by  the  rationale  described  in  the 
section  on  the  interference  pattern,  results  mainly  in  a 
shift  of  the  overhead  from  GPJS  to  HSJS.  The  TSO  error  is 
only  slightly  changed.  It  should  be  noted  that  due  to  the 
different  resource  demands  by  the  job  classes,  an  error  of 
one  percent  in  the  GPJS  throughput  means  a  much  larger 
m isal location  of  resource  usage  than  an  error  of  one  percent 
in  the  HSJS  throughput.  As  a  conseguence  of  this 
experiment,  we  know  that  the  distribution  of  unaccounted 
resource  usage  must  be  performed  with  great  care.  The  ideal 
case  would  certainly  be  if  all  resource  usage  were  accounted 
to  user  tasks  by  the  monitors.  Where  this  is  not  possible, 
the  breakdown  ratio  can  be  adjusted  until  the  utilization 
and  the  throughput  values  of  the  model  match  the  measured 
values  as  well  as  possible.  This  calibration  then  must  be 
validated  by  a  model  based  on  an  new  set  of  measurement  data 
collected  under  a  different  workload. 
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5.3  Validation  of  the  Birth-Death  Model 


5.3.1  Parameter  Estimation 


System  flodel 


Program  Behaviour  Bodel:  The  mean  think  time  as  specified  by 
the  terminal  simulator  is  12  seconds.  The  service 
request  per  cycle  is  one  interaction. 

Interference  Pattern:  No  interference  is  captured  by  the 
birth-death  model. 

Multiprogramming  Level:  The  maximum  number  of  active 

terminals  is  25.  Due  to  end  effects  within  the  TSO 
class,  the  mean  number  of  active  terminals  during  the 
TSO  class  elapsed  time  is  23.  The  maximum 

multiprogramming  level  of  TSO  interactions  in  the  system 
cannot  be  determined  as  it  is  dependent  on  the  activity 
of  the  other  job  classes.  Considering  the  large  backlog 
of  TSC  interactions  that  is  indicated  by  the  high 

average  number  of  ready  users,  the  average 
multiprogramming  level  must  be  close  to  the  maximum 
multiprogramming  level.  The  average  multiprogramming 
level  during  the  TSO  class  elapsed  time  is  calculated 
frcm  the  residence  times  to  be  2.45.  The  assumption  of 
constant  average  multiprogramming  levels  for  the  other 
classes  does  not  hold  in  practice  because  the  SRM 
balances  the  overall  lead  of  the  system,  so  that  the 
multiprogramming  level  of  the  different  classes  are 
highly  correlated.  However,  we  do  not  have  any 

information  about  this  correlation.  So  the  best 
approximation  is  the  assumption  of  steady  average 

multiprogramming  levels.  If  a  joint  probability 
distribution  of  the  multiprogramming  mix  could  be 
sampled,  we  could  circumvent  this  approximation. 


Model  Workload:  The  model  workload  is  determined  by  the 
think  time  and  the  mean  number  of  active  terminals. 


mean 


Ill 


Resource  Attributes:  The  throughput  rates  of  the  queueing 
network  model  are  used  as  capacities  for  the  birth-death 
model.  Assuming  average  multiprogramming  levels  of  2.1 
for  GPJS  and  0.73  for  HSJS  (these  are  the  average 
multiprogramming  levels  during  the  TSO  class  elapsed 
time) ,  the  following  capacities,  the  numbers  of 
interactions  com'pleted  per  second,  are  calculated  by 
linear  interpolation: 


TSO  MPL 

1 

2 

3 


Capacities 

.  2764 
.4943 
.  6686 


5.3.2  Solution  and  Performance  Measures 


The  parameters  of  the  system  model  can  be  used  directly 
as  input  to  the  solution  algorithm,  so  that  no 
transformation  need  be  performed  for  the  parameter 
representation.  For  the  structure  of  the  birth-death  model, 
see  figure  4. 

It  is  solved  by  a  program  described  in  [KSS  77].  The 

t 

model  is  evaluated  for  maximal  multiprogramming  levels  of  2 
and  3,  and  the  results  are  linearly  interpolated.  The 
expected  number  cf  interactions  in  the  system  is  a 
sufficiently  linear  function  of  the  maximal  multiprogramming 
level  to  allow  this  approximation. 

Table  14:  Results  of  the  birth-death  model. 


MPL 

E[n] 

throughp 
model  rel 
per  sec 

ut 

. error 
{%) 

response  time 
model  rel. error 
(sec)  (%) 

2.4 

15.92 

.5557 

♦  1.4 

28.64  -5.93 

2.  5 

16.  12 

.5730 

♦  4.6 

28.14  -7.59 

2.  6 

16.33 

.  5902 

♦  7.7 

27.67  -9.13 

In  order  to  compare  the  model  results  with  the  measured 
data,  we  calculate  the  mean  response  time  from  the 
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Figure  4.  Bi rth-death  model . 


measurement  data  following  some  operational  considerations 
(a  similar  way  of  calculation  can  be  found  in  [B0ZE76  1). 
Subtracting  the  sum  of  the  think  times  from  the  sum  of  the 
elapsed  times  of  all  TSO  interactions,  we  get  the  total  time 
the  interactions  spend  in  the  system.  Dividing  this  time  by 
the  number  of  interactions,  we  obtain  a  mean  response  time 
of  30.45  seconds.  Considering  this  response  time  and  the 
mean  think  time  of  12  seconds,  the  benchmark  certainly  does 
not  represent  a  very  realistic  TSO  load.  Dividing  the 
number  of  interactions  by  the  TSO  class  elapsed  time,  we 
obtain  a  mean  throughput  of  0.5478  interactions  per  second. 

There  are  several  possible  explanations  for  the  errors. 
Because  the  throughput  of  the  queueing  network  model  is  too 
high,  the  throughput  of  the  birth-death  model  must  certainly 
also  be  too  high.  The  real  maximum  multiprogramming  level 
for  TSO  is  not  fixed  as  in  the  model.  The  state 
probabilities  of  the  birth-death  model  show  that  for  all 
cases  evaluated  the  probability  that  the  multi programm ir g 
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level  is  at  its  laxiaum  is  greater  than  0.99.  This  is 
consistent  with  the  observation  of  the  large  backlog  of  TSO 
interactions.  So  the  aul ti program! in g  level  for  TSO  tasks 
follows  the  maximum  multiprogramming  level  as  specified  by 
the  SRH^  and  it  is  not  determined  by  the  arrival  stream  and 
the  capacities  of  the  server  as  assumed  by  the  model. 
Considering  this  observation,  it  is  questionable  whether  we 
should  use  a  birth-death  model  at  all.  Under  the 
circumstances  given  above,  the  results  are  very  reasonable. 
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6,  Extensions 


After  developing  what  coaid  be  viewed  as  a  basic  local 
balance  nodel,  we  want  to  show,  in  teras  of  the  methodology 
outlined  in  chapter  4,  how  to  extend  the  capabilities  of  the 
model.  The  range  of  computer  system  features  that  can  be 
modelled,  as  well  as  the  amount  of  detail  captured  by  a 
model,  can  be  increased  considerably.  The  extensions  we 
discuss  are  divided  into  three  categories  according  to  the 
modelling  effort  they  require:  models  in  local  balance, 
models  using  local  balance  as  an  approximation,  and  models 
in  not  in  local  balance. 


6.1  Models  in  Local  Balance 


Models  in  local  balance  have  two  great  advantages:  they 
require  only  gross  measurement  data  and  no  information  about 
the  dynamics  of  the  system’s  operation,  and  they  can  be 
solved  by  computational  algorithms  that  are  typically  many 
orders  of  magnitude  more  efficient  than  the  algorithms  for 
more  complex  models.  While  they  certainly  cannot  capture  as 
much  detail  as  models  using  more  sophisticated  techniques, 
all  the  authors  of  case  studies  using  the  absolute  modelling 
approach  together  with  local  balance  techniques  express 
their  satisfaction  with  the  results  (e.  g.  [  BW  74,  GIAH76]). 
Also,  the  fact  that  there  are  no  reports  on  more 
sophisticated  absolute  models  (the  published  absolute  models 
using  global  balance  techniques  satisfy  local  balance  as 
well,  e.g,  [M00B71,  CHHJ73,  801371])  seems  to  suggest  that 
there  are  certain  problems  in  their  application  to  practical 
case  studies. 

The  notion  of  local  balance  can  certainly  also  be 
applied  to  birth-death  models.  There  it  covers  basically 
models  as  described  in  [KLEI75,  chapter  3],  These  models 
can  be  extended  to  distinguish  several  classes  of  customers. 


115 


Batch  gobs 


The  turnaronod  tiie  of  batch  jobs  consists  of  three 
different  periods: 

•  reading  tine 

•  ti*e-in-syste*  ' 

•  waiting  tiae  for  print  and  printing  tiae 

To  obtain  the  aean  tiae-in-systea,  ae  use  a  hierarchical 
Bodel*  A  queueing  network  aodel  on  the  operating  systea 
level  supplies  the  service  rates  for  a  birth-death  aodel  on 
the  user  interface  level  that  has  an  infinite  source  of 
custoaers.  Assuaing  that  the  jobs  arrive  independently  of 
each  other,  an  arrival  rate  can  be  calculated: 


arrival  rate 


jobs  completed 
aeasureaent  interval 


By  applying  Little*s  foraula  to  the  mean  queue  length  of 
the  birth-death  aodel  we  obtain  the  aean  tiae-in-systea  of 
the  birth-death  aodel. 

The  reading  and  printing  for  tasks  is  noraally  handled 
by  a  spooling  system  that  operates  rather  independently  of 
the  rest  of  the  computer  systea.  Its  usage  of  modelled 
resources  such  as  the  CPU  or  I/O  channels  is  captured  on  the 
operating  system  level  by  the  interference  pattern  of  the 
system  aodel.  The  spooling  activities  can  be  viewed  as 
processes  separate  froa  the  actual  execution  of  the  tasks, 
and  they  can  be  aodelled  by  separate  birth-death  models. 
The  aean  time-in-system  calculated  with  these  spooling 
models  must  be  added  to  the  mean  time-in-system  of  the  aodel 
representing  the  rest  of  the  computer  system  to  obtain  the 
mean  turnaround  time. 


It  is  probably  not  very  realistic  to  model  card  readers. 
It  Bay  be  difficult  to  characterize  the  stochastic  processes 
that  describe  their  human  operators,  and  the  waiting  tine 
due  to  reading  is  small  in  most  practical  cases.  If  a 
reader  model  should  be  necessary,  it  could  be  done  in  a 
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manner  similar  to  the  modelling  of  the  printers  that  is 
described  below* 

For  the  printer  model  we  assume  a  number  of  printers  of 
the  same  type  that  are  available  for  all  requests.  The 
arrival  rates  are  the  throughput  rates  of  the  birth-death 
model  on  the  user  interface  level.  The  model  has  one  queue 
and  several  identical  servers  that  can  be  coalesced  to  cne 
server  with  load  dependent  capacity.  With  this  birth-death 
model,  we  calculate  the  mean  time  a  task  has  to  wait  for  a 
printer.  This  mean  waiting  time  should  be  the  same  for  all 
classes  if  no  priority  discipline  is  used.  Then  the  class 
dependent  printing  times  are  determined  from  the  class 
dependent  printing  demands  and  the  printer  capacities.  This 
method  of  calculating  the  turnaround  times  for  batch  jobs  is 
reported  in  [KSS  77].  It  can  be  extended  to  more  complex 
situations  if  a  more  sophisticated  birth-death  model  is 
a  pplied . 

Printer  Birth-Death  Model 

Model  Type  =  birth-death  model 

single  class 

local  balance  disciplines 

workload  vector 

load  dependent  server 

Modelling  Level:  user  interface  level 

Solution  Method:  Birth-death  formulas 

Some  new  parameters  must  be  added  to  the  system  model. 
In  the  resource  attributes,  the  capacities  of  the  printers 
must  be  specified.  They  can  be  calculated  as  described  in 
section  4.4.1.  The  workload  vector  for  the  printer  birth- 
death  model  contains  one  element  per  class,  the  product  of 
the  arrival  rate  times  the  average  printing  requirement  per 
task.  In  the  program  behaviour  model,  we  have  to  include 
the  average  number  of  print  lines  accounted  for  the  tasks. 
As  in  all  locally  balanced  models  we  do  not  need  to  consider 
dynamic  interference.  The  portion  of  the  printed  lines 
reported  by  the  software  monitor  but  not  accounted  is 
distributed  to  classes  in  proportion  to  the  number  of 
accounted  lines  by  class. 
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The  additional  measureaent  data  required  for  the  printer 

■odel  are: 

lines  per  printer 
accounted  lines  per  class 
utilizations  of  the  printers 


6 ,1^2  Load  Dependent  Servers 

Often  several  conponents  are  represented  as  one  server 
in  a  queueing  network  oodel.  This  is  particularly  true  for 
I/O  subsystems,  where  in  most  case  studies  a  channel  and  the 
disks  connected  to  it  are  modelled  as  a  single  server, 
normally  with  FCfS  discipline  and  exponential  service  time 
distributions  (see,  e.g.,  [ROSE76]).  Such  servers  do  not 
capture  the  complexity  of  the  entire  subsystem  adequately, 
so  at  least  load  dependent  servers  should  be  used  to  include 
more  detail. 

From  the  point  of  view  of  the  computational  solution 
methods,  modelling  the  load  dependence  of  servers  is 
relatively  simple.  Load  dependent  service  rates  are 
technically  feasible  for  local  balance  solution  algorithms 
[BUZE71,  BCMP75,  RK  76].  Their  practical  use,  however,  is 
restricted  by  the  difficulties  in  specifying  them  from 
measurement  data. 

There  are  several  ways  in  which  the  load  dependency  of 
the  capacities  could  be  determined.  By  doing  measurement 
experiments  while  systematically  varying  the 
multiprogramming  mix,  empirical  dependency  functions  can  be 
established.  Interpolation  can  be  used  for  situations  in 
which  no  measurements  are  performed.  The  measurements  for 
these  functions  are  probably  very  costly,  and  the 
interpolations  may  not  be  very  reliable. 

Alternatively,  the  load  dependent  service  rates  can  be 
calculated  using  a  detailed  queueing  network  model  on  the 
instruction  level  [ ZAH076,  IILH77].  The  servers  on  the 
lower  level  are  simpler  to  represent.  Even  if  they  show 
load  dependent  behaviour  like,  for  instance,  disks  in  their 
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seek  operations,  their  paraaeters  can  be  specified  acre 
easily.  The  major  problea  remains  the  specification  of  the 
input  parameters  for  these  models,  because  measurements  with 
the  required  amount  of  detail  are  normally  not  available. 

A  third  alternative  is  to  simulate  the  subsystem  in 
order  to  obtain  the  service  rates  for  the  composite  server 
that  represents  that  subsystem  in  the  queueing  network 
model.  This  way,  complex  algorithms  such  as  shortest  seek 
time  first  for  disks  can  be  modelled.  However,  the 
determination  of  the  input  parameters  is  the  same  problem 
for  the  simulation  as  for  the  analytic  model. 

The  ircst  appropritate  method  of  specifying  the  load 
dependent  service  for  queueing  network  models  would  be  to 
use  a  queueing  model  on  the  instruction  level.  But  until 
mere  accurate  measurement  tools  are  available,  load 
independent  service  rates  must  be  used,  then  adjusted  by 
calibration  in  order  to  match  the  job  throughput  of  the 
system  [R0SE76]. 


6.1.3  Memory  Constraints 


In  most  large  computer  systems,  the  multiprogramming  mix 
is  not  fixed  but  varies  according  to  the  memory  requirements 
of  currently  active  tasks  and  the  total  load  on  the  system. 
The  restriction  of  the  multiprogramming  mix  due  to  the 
limited  amount  of  real  memory  in  a  system  can  be  represented 
in  queueing  models. 

Brown,  Browne,  and  Chandy  present  a  modelling  method 
for  non-paged  memory  [BBC  77].  With  slight  changes,  this 
method  is  also  applicable  for  virtual  memory  systems.  Its 
essential  component  is  a  new  way  of  calculating  the  load 
dependent  service  rates  for  the  birth-death  model  on  the 
user  interface  level  of  a  hierachical  model.  From  the 
scheduling  discipline  and  the  probability  distribution  of 
the  program  sizes,  the  conditional  probability  distribution 


h(n|m)  is  determined,  specifying  the  probability  that  n 
tasks  are  in  main  memory  given  that  m  tasks  are  ready.  This 
distribution  is  then  used  to  weight  the  throughput  rates, 
r  (n) ,  of  the  gueueing  network  model  on  the  operating  system 
level.  The  load  dependent  service  rates  g (m)  for  the  birth- 
death  model  on  the  user  interface  level  are  calculated  as: 

m 

g(m)  =  r  (n)  ♦  h(nlra) 

n=  1 

Only  slight  changes  in  the  modelling  procedure  defined 
in  chapter  4  are  required  to  include  this  method  of 
modelling  memory  constraints.  The  evaluation  of  the 
queueing  network  model  and  the  birth-death  model  can  proceed 
as  before.  To  provide  the  parameters  for  the  calculation  of 
the  conditional  probabilities  h(n|m),  the  program  behaviour 
model  must  include  the  distribution  of  the  tasks*  memory 
sizes.  This  distribution  can  be  acquired  by  accounting 
monitors  such  as  SMF. 


6 .1.4  Set-up  Server 

In  data  processing  systems  with  large  files  often  data 
volumes  are  kept  off  line  and  are  mounted  on  request. 
Because  the  time  a  task  spends  waiting  for  volumes  to  be 
mounted  can  be  very  long,  it  can  influence  the  performance 
of  a  system  quite  strongly.  Thus,  a  set-up  server  could  be 
included  in  queueing  models  of  such  systems  [ BW  74,  GIAH76]. 
The  set-up  waiting  time  can  be  modelled  on  the  user 
interface  level  or  on  the  operating  system  level,  depending 
on  whether  the  mounts  for  a  task  are  requested  before  or 
after  its  initiation  respectively.  The  set-up  server  will 
probably  be  a  composite  server  with  load  dependent  capacity, 
but  its  final  representation  in  the  queueing  model  must  be 
determined  based  on  the  measurement  data. 


The  introduction  of  a  set-up  server  requires  additional 
actions  at  all  modelling  stages.  Either  an  extra  server  has 
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to  be  added  to  the  queueing  network  model,  or,  if  the  mounts 
are  to  be  modelled  on  the  user  interface  level,  the  birth- 
death  model  has  to  be  replaced  by  a  more  complex  queueing 
model . 


In  the  system  model,  the  service  discipline  for  the  set¬ 
up  server  and  its  capacity  must  be  added  to  the  resource 
attributes.  The  mean  number  of  mounts  for  a  task  must  be 
included  in  the  model  workload  and  in  the  program  behaviour 
model.  Because  the  system  overhead  is  probably  negligible 
and  the  dynamic  interference  need  not  be  modelled,  the 
interference  pattern  need  not  be  changed. 


T  o 
set-up 
series 


determine  the  load  dependent  service  rates  of  the 
server  as  a  function  of  the  outstanding  requests,  a 
of  special  measurement  experiments  must  be  performed. 


6,2  Model  Using  Local  Balance  as  as  Approximation 
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6.2.1  Priority  Disciplines 

Priority  disciplines  can  have  a  great  impact  on  the 
response  times  of  tasks.  Thus,  if  the  mean  response  times 
of  a  computer  system  using  priority  scheduling  have  to  be 
calculated  by  a  model,  the  priority  discipline  should  be 
represented  in  the  model.  Two  methods  for  modelling 
priority  disciplines  are  described  in  section  2.3,  One  of 
them,  Sevcik's  method  [SEVC77a],  is  entirely  compatible  with 
local  balance  techniques. 
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The  use  of  priority  disicplines  in  queueing  models 
requires  special  consideration  only  in  the  parameter 
representation  and  in  the  solution  algorithm.  The  system 
model  and  thus  also  the  measurement  requirements  are  the 
same  as  for  models  using  local  balance  disciplines. 


6.2.2  Aggregated  Servers  for  Models  not  in  Local  Balance 


If  1 
systems  t 
resulting 
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ocal  balance  modelling  techniques  are  applied  to 
hat  do  not  satisfy  the  required  conditions,  the 
performance  measures  may  contain  severe  errors, 
case  is  that  the  service  discipline  of  a  server  is 
its  service  time  has  a  general,  nonexponential 
ion.  In  section  2.3  we  described  how  Norton’s 
can  be  applied  to  aggregate  the  subsystem  into  one 
server.  The  coefficient  of  variation  of  the 
time  distribution  is  calculated  from  the 
nts  of  variation  of  the  service  time  distributions 
subsystem’s  servers.  The  resulting  two  server 
system  is  then  solved  by  global  balance  technigues. 


In  a  recent  paper,  Keller  proposes  a  technigue  to  model 
limited  parallelism  within  a  subnetwork  of  a  computer  system 
['KELL77].  The  overall  modelling  procedure  is  the  same  as 
described  in  [SC  75]  (see  section  2.3).  The  essential  part 
of  Keller’s  paper  is  his  method  of  calculating  the 
coefficient  of  variation  of  the  aggregate  server  that 
represents  the  subnetwork  with  the  limited  parallelism. 


All  the  cases  discussed  above  reguire  considerably  more 
modelling  effort  than  models  in  local  balance.  In  the 
measurements,  the  actual  service  time  distributions  of  the 
modelled  computer  system  components  must  be  recorded,  and  in 
the  computational  algorithm,  the  linear  system  of  global 
balance  equations  must  be  solved.  In  the  following  we  want 
to  outline  the  changes  in  the  modelling  stages  that  are 
required  for  these  models. 
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Two  queueing  network  models  must  be  designed:  the  local 
balance  model  "for  the  subnetwork  and  the  two  server,  global 
balance  system  for  the  overall  model.  Both  are  models  on 
the  operating  system  level.  The  parameter  requirements  for 
the  submodel  are  straightforward.  They  are  the  same  as,  for 
instance,  for  the  model  described  in  chapter  5,  The  global 
balance  model  requires,  as  input  parameters,  the  service 
disciplines,  the  multiprogramming  mix,  and  the  structure  of 
the  generalized  Cox-type  servers  that  result  in  acceptable 
approximations  of  the  actual  service  time  distributions 
[COX  55],  The  system  model  must  be  general  enough  to 
provide  a  basis  for  the  parameter  estimation  of  both 
queueing  models. 

The  time  required  for  the  data  transfer  of  I/O  requests 
is  small  in  comparison  with  the  other  times  involved  such  as 
waiting  time,  seek  time,  and  rotational  delay.  The  I/O 
requests  to  particular  di rect- access  devices  may  come  from 
all  tasks  that  are  concurrently  active.  As  a  consequence, 
it  can  be  assumed  that  the  coefficients  of  variation  of  the 
I/O  service  time  distributions  are  properties  of  the 
resources  (i.e,,  the  channels  or  the  devices)  rather  than 
properties  of  the  programs  invoking  them.  Thus  the 
coefficients  of  variation  of  the  I/O  service  time 
distibutions  must  be  included  in  the  resource  attributes. 
The  capacities  of  the  I/O  servers  as  well  as  their 
coefficients  of  variation  are  normally  load  dependent. 

The  model  workload  for  the  global  balance  model  must  not 
only  describe  the  overall  workload  per  server,  but  also  the 
workload  dynamics.  It  must  contain  the  multiprogramming  mix 
and  the  mean  and  the  coefficients  of  variation  of  the 
service  time  distributions  for  the  two  servers. 

The  program  behaviour  model  must  also  capture  parts  of 
the  dynamics  of  the  user  tasks.  For  the  CPU  it  must  contain 
the  mean  and  the  coefficient  of  variation  of  the  logical 
burst  time  distribution.  The  I/O  related  paramemters  of  the 
program  behaviour  model  need  only  contain  the  number  of  I/O 
requests  to  the  particular  devices.  From  these 
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counts  the  probabilities  for  the  transitions  of  a  task  among 
the  devices  can  be  calculated. 

In  the  interference  pattern,  a  dynamic  interference 
function  naps  the  mean  and  the  coefficient  of  variation  of 
the  logical  CPU  burst  time  distribution  into  the 
corresponding  parameters  of  the  distribution  of  the  physical 
bursts.  This  interference  function  is  dependent  on  the 
multiprogramming  mix.  The  multiprogramming  mix  can  be 
determined  from  the  response  times  in  the  same  way  as  for 
the  local  balance  models. 

Most  of  the  parameters  of  this  system  model  can  be 
derived  from  the  same  gross  measurements  as  required  for 
local  balance  models.  However,  to  determine  the 
ccefficients  of  variation  of  all  the  distributions  involved 
and  to  establish  the  dynamic  interference  function,  event 
traces  must  be  recorded.  As  described  in  chapter  3,  event 
traces  require  much  more  effort  than  sampling  and  event 
counting,  and  the  results  are  much  more  difficult  to 
interpret. 


6.2.3  Multiple  Class  Birth-Death  Models 


In  hierarchical  models  generally,  the  throughput  rates 
of  the  queueing  network  models  on  the  operating  system  level 
are  used  as  service  rates  of  the  birth-death  models  on  the 
user  interface  level.  If  the  queueing  network  model  has 
several  classes  of  customers,  the  load  dependent  service 
rates  of  a  single  class  birth-death  model  cannot  be 
calculated  exactly.  The  input  of  the  birth-death  model,  the 
throughput  rates  of  the  queueing  network  models  attained 
under  various  multiprogramming  mixes,  must  be  weighted  with 
the  proportions  of  the  time  that  these  multiprogramming 
mixes  exist.  These  proportions,  however,  are  the  quantities 
that  are  to  be  determined  by  the  birth-death  models.  So  we 
have  a  circular  dependency. 
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In  [ KSS  77]  there  are  two  ways  proposed  to  solve  this 
problea.  By  making  an  initial  assumption  about  the  weights, 
a  first  approximation  can  be  calculated  that  is  subsequently 
refined  by  an  iterative  process.  However,  there  is  no 
guarantee  that  the  iteration  converges.  A  more  promising 
alternative  is  to  use  a  multiple  class  birth-death  model 
that  has,  for  each  possible  multiprogramming  mix,  the 
corresponding  throughput  rate  vector  of  the  queueing  network 
model  as  its  service  rate  vector.  This  solution  method  is 
exact  within  the  modelling  assumptions.  The  most  convenient 
choice  wculd  certainly  be  birth-death  models  in  local 
balance.  If,  however,  particular  scheduling  disciplines 
such  as  priority  scheduling  are  to  be  modelled,  or  if  there 
are  restrictions  on  the  possible  multiprogramming  mixes  (for 
instance  that  the  sum  of  the  multiprogramming  levels  of  two 
classes  not  exceed  a  certain  limit),  more  general  birth- 
death  models  can  be  designed. 

For  multiple  class  birth-death  models,  the  modelling 
stages  do  not  require  extensive  changes.  The  parameter 
representation  is  straightforward,  so  that  only  the  solution 
algorithm  for  the  birth-death  models  has  to  be  designed 
specially . 


6.3  Models  not  in  Local  Balance 


Models  not  in  local  balance  are,  in  general,  solved  by 
global  balance  techniques  that  allow  in  principle  an 
extremely  high  modelling  accuracy.  By  Cox*  method  of  stages 
[COX  55],  all  probability  distributions  with  rational 
Laplace  transforms  can  be  represented  arbitrarily  closely. 
Very  complex  service  disciplines  can  be  incorporated  into 
the  global  balance  equations.  However,  in  practice  the 
modelling  accuracy  of  global  balance  models  is  limited  by 
the  measurement  accuracy  of  the  event  traces,  and  their 
feasibility  is  severely  restricted  by  the  computational 
effort  required  for  their  solution.  The  number  of  linear 
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equations  even  for  simple  models  can  be  very  large,  so  that 
their  solution  becomes  economically  infeasible. 


The  practical  importance  of  global  balance  models  is 
currently  restricted  to  the  verification  of  more  efficient 
approximation  methods.  Recently  published  case  studies 
using  the  absolute  jnodelling  approach,  e.g.,  [GIAM76, 
NEIL76,  R0SE76,  BBC  77],  use  local  balance  techniques. 


In  the  most  general  case  a  global  balance  queueing  model 
is  specified  in  terms  of  its  global  balance  equations.  In  a 
mere  restricted  context,  the  generation  of  the  equations  can 
be  automated,  and  the  input  parameters  for  the  solution 
algorithm  may  be  problem  oriented  variables.  For  instance, 
the  QSOLVE  system  [LEVY77  ]  requires  as  input  the 
multiprogramming  mix,  the  service  disciplines,  the 
transition  probabilities  of  jobs  among  the  servers,  and  the 
means  and  variances  of  the  service  time  distributions. 


The  system  model  for  global  balance  models  and  thus  also 
the  required  measurement  data  are  the  same  as  for  the 
approximations  described  in  section  6.2.2.  Between  these 
approximations  and  the  global  balance  solution  method,  a 
trade-off  of  accuracy  versus  effort  can  be  made.  Building 
on  the  same  system  model,  the  approximations  have  more 
efficient  solution  algorithms,  but  there  is  a  larger  loss  of 
accuracy  in  the  parameter  representation  and  in  the  course 
of  the  solution  than  for  global  balance  techniques. 
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7.  Conclusions 


7.1  Parameter  Specification  Methodology 


The  methodology  for  the  parameter  estimation  and 
representation  proved  to  be  helpful  in  the  modelling  studies 
described  in  chapters  4  and  5.  Once  the  measurement  data 
were  collected,  the  models  were  built  quickly  and  without 
too  much  effort.  The  extensions  described  in  chapter  6  show 
that  the  methodology  is  sufficiently  versatile  to  be  used  in 
general  modelling  contexts.  The  system  model  can  explicitly 
represent  much  more  detail  of  the  system  than  the  queueing 
network  model,  because  it  is  not  restricted  by  any  solution 
method  that  requires  a  particular  representation  of  the 
parameters.  The  separation  of  the  system  model  parameters 
into  their  logical  components  has  several  benefits: 

•  The  separation  gives  a  better  understanding  of  the 
system  and  its  operation, 

•  The  separation  provides  more  formal  transitions  from 
the  measurement  data  to  the  input  parameters  of  the 
queueing  network  model. 

•  The  separation  makes  the  modelling  decisions  more 
explicit  and  restricts  them  to  the  parameters 
immediately  involved. 

•  The  separation  permits  a  more  explicit  representation 
of  changes  to  the  computer  sjstem  in  the  model  by 
modelling  more  features  of  the  computer  system 
explicitly , 

In  order  to  fully  exploit  the  parameter  specification 
methodology,  the  models  must  be  based  on  more  detailed 
ujpasurement  data  obtained  in  more  measurement  experiments. 
In  particular,  for  a  reliable  and  accurate  specification  of 
the  interference  pattern,  we  need  a  broader  foundation.  We 
must  identify  the  system  parameters  on  which  the  functions 
depend  that  map  the  logical  service  requests  of  the  tasks 
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into  physical  operations  of  the  computer  systeai,  Also^  we 
must  determine  what  service  requests  cause  what  service 
demands  by  the  operatinq  system.  To  this  end,  we  must  know 
more  about  the  operating  system  logic  and  represent  it  in 
the  interference  functions.  Finally,  we  must  know  more 
about  the  location  of  the  system  data  sets,  whose  usage  is 
net  accounted,  and  about  their  use. 


The  interference  functions  in  this  study,  as  far  as  they 
could  be  established  at  all,  are  specified  just  by  the 
factors  that  represent  their  slopes.  Even  if  we  assume 
linear  interference  functions,  we  should  not  assume  that 
they  all  go  through  the  origin.  We  should  rather  perform 
measurements  for  different  load  situations  and  then 
approximate  these  functions  as  seems  appropriate  from  the 
measurement  data. 


This  last  point  also  holds  for  the  capacities  of  the 
hardware  resources.  We  know  that  the  capacities  of  all 
resources  are  load  dependent.  Even  the  CPU  instruction  rate 
depends  tc  a  great  extent  on  the  job  mix  and  on  the  relation 
of  the  problem  state  and  the  system  state,  as  problem  tasks 
normally  have  a  much  greater  high  speed  cache  hit  ratio  than 
operatinq  system  tasks  [SDTH77].  It  would  probably  improve 
the  reliability  of  the  model  under  different  load  situations 
considerably  if  we  could  establish  load  dependent  capacities 
that  are  based  on  a  broad  set  of  measurement  experiments. 


The  system  model,  in  particular  the  interference 
functions,  should  not  only  be  improved  by  founding  it  on  a 
wider  basis  of  measurement  data,  but  it  should  also  be 
validated.  Once  a  system  model  for  a  particular  computer 
system  is  specified,  it  should  be  used  to  model  the  system 
under  a  load  different  from  the  load  used  for  its 
specification.  This  way  the  confidence  in  the  predictive 
ability  of  the  model  can  be  established. 


The  numerical 
into  the  parameters 
parameters  of  the 
if  it  must  be  done 


transformation  of  the  measurement  data 
of  the  system  model  and  into  the  input 
solution  algorithms  is  extremely  tedious 
manually.  In  order  to  allow  the  analyst 
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tc  concentrate  on  the  essential  decisions^  an  automatic 
system  should  be  designed  to  perforin  these  transformations 
without  binding  any  modelling  decisions  or  hiding  any 
assumptions.  For  easy  use  and  quick  evaluation  of 
alternatives,  an  interactive  system  would  be  preferable. 


7.2  The  Models 


The  level  of  detail  represented  in  both  the  single  class 
model  and  the  multiple  class  model  seems  to  be  appropriate 
and  consistent.  The  results  are  reasonably  accurate  for 
both  models  considering  the  modelling  assumptions,  the 
decicions  involved,  and  the  measurement  data  on  which  they 
are  based. 

Building  a  single  class  model  first  and  then  proceeding 
tc  a  multiple  class  model  proved  to  be  very  helpful.  The 
limited  number  of  parameters  of  the  single  class  model 
allowed  us  to  validate  some  of  the  modelling  decisions  that 
could  not  have  been  verified  easily  in  the  multiple  class 
model.  The  decisions  tc  model  channels  and  disks,  as 
opposed  to  either  one  or  the  other,  and  to  calculate  the 
capacity  of  each  disk  individually,  were  proved  valid  in  the 
single  class  model.  However,  there  is  one  disadvantage  to 
this  decision:  the  capacities  determined  this  way  are  load 
dependent,  so  they  may  not  hold  for  other  loads  and,  thus, 
they  restrict  the  predicitve  ability  of  the  model.  As  long 
as  the  utilizations  of  the  disks  are  low,  this  restriction 
is  not  very  severe,  and  it  can  be  alleviated  by  providing 
more  measurement  data  for  a  better  statistical  foundation  of 
the  capcities. 

The  distribution  of  the  overhead  and  the  determination 
of  the  multiprogramming  level  under  which  to  evaluate  the 
model  seem  to  be  the  most  critical  points  in  this  modelling 
study.  In  order  to  obtain  confidence  in  the  decisions  in 
these  matters,  validation  should  be  done  using  measurement 
data  that  is  obtained  from  a  different  workload. 
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he  measurements  were  not  performed  on  an  ideal 
oad.  The  large  differences  in  the  class  elapsed  times 
her  with  the  global  load  balancing  by  the  SRM  caused 
variations  in  the  operational  environment,  for 
nee,  in  the  mult iprograaming  aix  and  the  queue  lengths, 
ystem  was  not  even  approximately  in  equilibrium.  The 
load  of  the  benchmark  is  not  very  realistic  in  its 
ion  of  mean  think  time  and  mean  response  time, 
ermore,  the  fact  that  all  25  simulated  terminal 
ons  use  the  same  script  means  that  they  request  the 
resources  at  about  the  same  time.  This  phase  behaviour 
ases  the  nonhomogeneity  of  the  workload  as  well.  This 
mark  was  not  designed  to  be  used  in  connection  with 
ing  network  models,  but  for  the  tuning  purposes  of 
As  a  consequence,  the  statistical  basis  of  the 
oad  should  be  given  much  attention.  The  end  effects 
d  be  eliminated  by  providing  longer  job  streams  and 
nating  the  measurements  when  the  first  job  class  stream 
hes.  Each  stream  should  contain  jobs  that  are 
stically  independent  with  respect  to  their  behaviour 
heir  resource  requirements. 


The  accuracy  of  the  sampling  measurements  should  be 
improved  by  finding  some  heuristic  for  choosing  the 
appropriate  sampling  cycle  times.  To  give  guidelines  for  an 
estimation  of  the  accuracy  of  the  values  obtained  by 
sampling,  not  only  the  mean,  but  also  the  variance  should  be 
reported.  As  the  statistical  literature  does  not  offer  very 
much  help  for  estimating  confidence  intervals  for  this  kind 
of  sampling,  seme  fundamental  research  should  be  done  in 
this  area. 


The  amount  of  detail 

insufficient  particularly  in 
of  the  physical  operation  of 
specific  jcb  classes.  SflF  c 
service  requests,  but  it 


in  the  measurement  data  is 
one  respect:  too  great  a  part 
the  system  is  not  accounted  to 
aptures  most  of  the  logical 
cannot  easily  be  determined 


requests. 
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quantitatively  how  these  requests  relate  to  the  physical 
operations,  particularly  in  the  I/O  subsystem.  The  causes 
and  quantities  of  resource  usage  by  the  operating  system  are 
difficult  to  identify. 

Despite  all  these  disadvantages  the  measurement  data 
show  good  consistency.  This  can  be  observed  in  the  good 
results  of  the  single  class  model,  for  which  the  above 
mentioned  defects  are  not  so  crucial.  The  concept  of  the 
division  of  SHF  and  PMF,  one  providing  the  job  related  data 
and  the  other  system  related  data,  seems  to  be  reasonable  as 
long  as  the  measurement  records  are  written  to  the  same  data 
set. 


7.4  Monitors  for  Queueing  Network  Models 


In  this  section  we  want  to  summarize  the  characteristics 
a  monitor  should  have,  if  it  is  to  be  used  in  connection 
with  local  balance  models.  Models  that  require  a  more 
accurate  workload  description  than  the  workload  vector  gives 
are  not  considered  here.  Measurements  for  such  models,  for 
example,  measurements  to  determine  the  second  moments  of  the 
burst  lengths,  require  event  traces  and  more  sophisticated 
moasurement  tools.  A  specification  for  such  tools  requires 
and  deserves  further  research.  For  models  for  which  the 
workload  vector  is  an  adequate  workload  description,  a 
combination  of  sampling  and  event  tracing  seems  to  be 
appropriate . 
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irstant  into  one  record,  the  sampling  monitor  should  provide 
the  possibility  of  evaluating  statistical  correlations  among 
the  system  parameters.  Giving  special  attention  to  some 
important  parameters  like  the  multiprogramming  mix  and  the 
CPU  state,  it  should  indicate  to  what  degree  the  system 
operates  in  equilibrium.  For  specifying  the 
multiprogramming  mix,^  a  sampling  monitor  should  provide  a 
joint  probability  distribution  of  the  user  multiprogramming 
levels. 


The  event  trace  does  not  need  to  give  data  on  single 
tasks.  A  breakdown  by  user  classes  is  sufficient  for 
modelling  purposes.  This  breakdown,  however,  should  be 
extended  much  further  than  in  existing  monitors.  It  should 
net  only  capture  the  logical  service  requests  (e.g.  EXCPs, 
page  faults) ,  but  also  the  physical  operations  resulting 
from  them  (e.g,  SIOs) ,  It  should  also  allow  a  better 
breakdown  by  classes  of  the  operating  system  activities, 
although  there  are  certainly  some  activities  whose  cause 
cannot  be  traced,  so  a  complete  accounting  of  the  resource 
usage  to  user  classes  is  not  possible.  For  validation 
purposes,  the  event  trace  monitor  should  collect  total 
resident  times  and  total  ti mes-in-system  by  class,  as  well 
as  the  response  times  for  interactive  classes. 


Finally,  both  monitor  types  should  identify  their  own 
resource  usage  as  much  as  possible. 


2 r 5  Suggestions  for  an  Improvement  of  RAF  and  SHF 


The  software  monitors  SflF  and  RMF  proved  to  be  very 
helpful  in  this  study  although  they  were  not  especially 
designed  to  provide  data  for  queueing  network  models. 
However,  some  of  their  shortcomings  were  also  discovered. 
In  the  following  section,  we  want  to  suggest  some 
modifications  that  should  not  be  too  difficult  to  implement, 
and  that  would  improve  the  usability  of  these  monitors  for 
queueing  network  models  considerably.  First,  however,  we 
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want  to  comment  on  the  class  definition  in  MVS  operat 
systems.  The  notion  in  MVS  that  comes  closest  to  an  id 
class  definition  and  that  is  used  to  discriminate  among 
jcbs  in  the  system  is  the  performance  group  [IBM76 
Because  some  of  the  RMF  data,  like  the  mu Iti prog ramm 
levels  for  example,  are  sampled  by  domain  [IBM76d]  rat 
than  by  performance  group,  the  performance  groups  sho 
each  have  their  own  set  of  domains  not  intersecting  w 
domains  of  other  performance  groups.  Then  a  domain  can 
viewed  as  a  subclass. 
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7.5.1  RMF  Modifications 

CPU  Activity  Report: 

The  CPU  activity  report  should  at  least  distinguish 
system  and  problem  state.  It  would  be  preferable  if  it 
would  also  distinguish  CPU  usage  by  performance  group. 

Paging  Activity  Report: 

The  paging  rates  for  the  address  spaces,  the  VIOs,  and 
the  swaps  given  in  this  report  are  not  consistent  with 
the  numbers  of  pages  transferred  reported  in  the  step 
termination  records  of  SMF.  The  report  about  the  swaps 
distinguishing  swapping  reasons  should  be  broken  down  by 
performance  groups. 

Workload  activity  report: 

The  number  of  swaps  reported  here  is  inconsistent  with 
the  numbers  given  by  the  SMF  step  termination  records. 
The  sum  of  the  residence  times  of  the  transactions  ended 
in  the  interval  should  be  reported.  This  would  allow  us 
to  compute  average  multiprogramming  levels  for  the 
performance  groups  and  the  domains. 

The  Channel  Activity  Report,  the  I/O  Device  Activity 
Report,  and  the  Page/Swap  Data  Set  Activity  Report  should 
produce  their  data  broken  down  by  performance  groups.  As 
^  5  breakdown  might  involve  a  considerable  amount  of 
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overhead,  its  benefits  mast  be  traded  off  against  its 
disadvantages.  Finally,  an  Operating  System  Activity  Report 
could  be  produced  which  provides  information  about  the  usage 
of  resources  by  the  operating  system,  about  the  usage  of 
operating  system  data  sets,  and  about  the  behaviour  of  the 
operating  system  tasks  (started  tasks). 


7.5.2  SMF  Modifications 


The  SMF  step  termination  records  should  contain  all 
possibly  accountable  resource  usage.  If  a  task  changes  its 
domain  during  execution,  a  separate  record  should  be 
produced  for  each  domain.  The  records  should  also  contain 
the  domain  number,  not  only  the  performance  group  number. 

The  CPU  times  attained  under  System  Reguest  Blocks 
(SRBs)  and  the  CPU  times  attained  under  Task  Control  Blocks 
(TCBs)  both  contain  some  overhead,  but  not  the  same 
overhead.  It  would  be  better  for  modelling  purposes  if  one 
variable  would  contain  all  CPU  time  that  can  be  traced  to  a 
particular  task.  The  device  related  data  should  not  only 
give  EXCPs,  but  also  SIOs,  and  they  should  include  operating 
system  I/Os  for  a  particular  task  as  far  as  they  are 
accountable.  The  paging  data  as  well  should  give  not  only 
the  number  of  pages  transferred,  but  also  the  SIOs,  and  they 
should  identify  the  disks  to  which  or  from  which  the 
transfers  occur.  The  number  of  swaps  should  be  broken  down 
by  the  reasons  for  the  swaps.  The  spooling  activity  of  the 
tasks  should  be  reported  in  the  step  termination  record 
rather  than  in  the  Hasp  Purge  record. 
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DATA  BASES 

J.H.  Gilles  Farley  and  Stewart  A.  Schuster,  March  1975 

CSRG-54  AN  ANNOTATED  BIBLIOGRAPHY  ON  COMPUTER 
PROGRAM  ENGINEERING 

J.V.  Guttag  (ed.) ,  Third  Edition,  April  1975 

CSRG-55  STRUCTURED  SUBSETS  OF  THE  PL/1  LANGUAGE 

Richard  C.  Holt  and  David  B.  Wortman,  May  1975 


CSRG-56  FEATURES  OF  A  CONCEPTUAL  SCHEMA 

D,  Tsichritzis,  June  1975  [Proceedings  Very  Large 
Data  Base  Conference,  1975] 

*  CSRG-57  MERLIN;  TOWARDS  AN  IDEAL  PROGRAMMING  LANGUAGE 

Eric  C.R.  Hehner,  July  1975 

CSRG-58  ON  THE  SEMANTICS  OF  THE  RELATIONAL  DATA  MODEL 
Hans  Albrecht  Schmid  and  J.  Richard  Swenson, 

July  1975  [Proceedings  of  the  ACM  SIGMOD  Conference, 
1975] 

*  CSRG-59  THE  SPECIFICATION  AND  APPLICATION  TO  PROGRAMMING 

OF  ABSTRACT  DATA  TYPES 

John  V.  Guttag,  September  1975  [Ph.D.  Thesis,  DCS,  1975 

CSRG-60  NORMALIZATION  AND  FUNCTIONAL  DEPENDENCIES  IN  THE 
RELATIONAL  DATA  BASE  MODEL 
Phillip  Alan  Bernstein,  October  1975 
[Ph.D.  Thesis,  DCS,  1975] 

*  CSRG-61  LSL;  A  LINK  AND  SELECTION  LANGUAGE 

D.  Tsichritzis,  November  1975  [Proceedings  ACM 
SIGMOD  Conference,  1976] 

*  CSRG-62  COMPLEMENTARY  DEFINITIONS  OF  PROGRAMMING 

LANGUAGE  SEMANTICS 

James  E.  Donahue,  November  1975 

[Ph.D.  Thesis,  DCS,  1975] 

CSRG-63  AN  EXPERIMENTAL  EVALUATION  OF  CHESS  PLAYING 
HEURISTICS 

Lazio  Sugar,  December  1975  [M.Sc.  Thesis,  DCS,  1975] 

CSRG-64  A  VIRTUAL  MEMORY  SYSTEM  FOR  A  RELATIONAL 
ASSOCIATIVE  PROCESSOR 

S.A.  Schuster,  E.A.  Ozkarahan,  and  K.C.  Smith, 

February  1976  [Proceedings  National  Computer 
Conference  1976,  v.45,  pp. 855-862] 

*  CSRG-65  PERFORMANCE  EVALUATION  OF  A  RELATIONAL 

ASSOCIATIVE  PROCESSOR 

E. A.  Ozkarahan,  S.A.  Schuster,  and  K.C.  Sevcik, 

February  1976  [ACM  Transactions  on  Database 
Systems,  v.1,  n:4,  December  1976] 

CSRG-66  EDITING  COMPUTER  ANIMATED  FILM 
Michael  D.  Tilson,  February  1976 
[M.Sc.  Thesis,  DCS,  1975] 

CSRG-67  A  DIAGRAMMATIC  APPROACH  TO  PROGRAMMING  LANGUAGE 
SEMANTICS 

James  R.  Cordy,  March  1976  [M.Sc.  Thesis,  DCS,  1976] 

*  CSRG-68  A  SYNTHETIC  ENGLISH  QUERY  LANGUAGE  FOR  A 

RELATIONAL  ASSOCIATIVE  PROCESSOR 

L . Kerschberg ,  E.A.  Ozkarahan,  and  J.E.S.  Pacheco 
April  1976 


CSRG-69  AN  ANNOTATED  BIBLIOGRAPHY  ON  COMPUTER  PROGRAM  ENGINEERING 

D.  Barnard  and  D,  Thompson  (Eds.) ,  Fourth  Edition,  May  1976 

CSRG-70  A  TAXONOMY  OF  DATA  MODELS 

L.  Kerschberg,  A.  Klug,  and  D.  Tsichritzis,  May  1976 
[Proceedings  Very  Large  Data  Base  Conference,  1976] 

CSRG-71  OPTIMIZATION  FEATURES  FOR  THE  ARCHITECTURE  OF  A 
DATA  BASE  MACHINE 

E. A.  Ozkarahan  and  K.C,  Sevcik,  May  1976 

CSRG-72  THE  RELATIONAL  DATA  BASE  SYSTEM  OMEGA  -  PROGRESS  REPORT 
H.A,  Schmid  (ed. ) »  P-A.  Bernstein  (ed.),  B.  Arlow, 

R.  Baker  and  S,  Pozgaj,  July  1976 

CSRG-73  AN  ALGORITHMIC  APPROACH  TO  NORMALIZATION  OF 
RELATIONAL  DATA  EASE  SCHEMAS 
P.A.  Bernstein  and  C,  Beeri,  September  1976 

CSRG-r74  A  HIGH-LEVEL  MACHINE-ORIENTED  ASSEMBLER  LANGUAGE 
FOR  A  DATA  BASE  MACHINE 

E.A.  Ozkarahan  and  S.A.  Schuster,  October  1976 

CSRG-75  DO  CONSIDERED  OD:  A  CONTRIBUTION  TO  THE 
PROGRAMMING  CALCULUS 
Eric  C.R.  Hehner,  November  1976 

CSRG-76  "SOFTWARE  HUT":  A  COMPUTER  PROGRAM  ENGINEERING 
PROJECT  IN  THE  FORM  OF  A  GAME 
J.J.  Horning  and  D. B,  Wortman,  November  1976 

CSRG-77  A  SHORT  STUDY  OF  PROGRAM  AND  MEMORY  POLICY  BEHAVIOUR 
G.  Scott  Graham,  January  1977 

CSRG-78  A  PANACHE  OF  DBMS  IDEAS 

D.  Tsichritzis,  February  1977 

CSRG-79  THE  DESIGN  AND  IMPLEMENTATION  OF  AN  ADVANCED  LALR 
PARSE  TABLE  CONSTRUCTOR 

David  H.  Thompson,  April  1977  [M.Sc.  Thesis,  DCS,  1976] 

CSRG-80  AN  ANNOTATED  BIBLIOGRAPHY  ON  COMPUTER  PROGRAM  ENGINEERING 
D.  Barnard  (Ed.),  Fifth  Edition,  May  1977 

CSRG-81  PROGRAMMING  METHODOLOGY:  AN  ANNOTATED  BIBLIOGRAPHY  FOR 
IFIP  WORKING  GROUP  2.3 

Sol  J.  Greenspan  and  J.J.  Horning  (Eds.),  First  Edition, 

May  1977 

CSRG-82  NOTES  ON  EUCLID 

edited  by  W.  David  Elliot  emd  David  T.  Barnard, 

August  1 977 

CSRG-83  TOPICS  IN  QUEUEING  NETWORK  MODELING 
edited  by  G.  Scott  Graham,  July  1977 


CSRG-84  TOWARD  PROGRAM  ILLUSTRATION 

Edward  Yarwood,  September  1977  [M.Sc.  Thesis,  DCS,  1974] 


CSRG-85  CHARACTERIZING  SERVICE  TIME  AND  RESPONSE  TIME  DISTRIBUTIONS 
IN  QUEUEING  NETWORK  MODELS  OF  COMPUTER  SYSTEMS 
Edward  D,  Lazowska^  September  1977 
[Ph.D.  Thesis^  DCS,  1977] 

CSRG-86  MEASUREMENTS  OF  COMPUTER  SYSTEMS  FOR 
QUEUEING  NETWORK  MODELS 
Martin  G,  Kienzle,  October  1977 
[M.Sc.  Thesis,  DCS,  1977] 
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